Quantifying variant contributions in cystic kidney disease using national-scale whole-genome sequencing

A full study workflow with high-level results can be found in Figure 1.

Study workflow detailing high-level methods and results.Figure 1

Study workflow detailing high-level methods and results. Flow diagram depicting the methods and high-level results from the study. CyKD, cystic kidney disease; 100KGP, 100,00 genome project; UKBB, UK Biobank, GATE, genetic analysis of time to event.

The CyKD cohort demographics. The CyKD cohort consisted of 1,558 participants recruited to the 100,000 Genomes Project (100KGP), of which 1,294 were probands (full breakdown of recruited pedigrees in Supplemental Table 1A; supplemental material available online with this article; https://doi.org/10.1172/JCI181467DS1). The demographic information and top 5 most frequent Human Phenotype Ontology (HPO) codes of probands are shown in Table 1.

Table 1

Demographic and phenotypic breakdown of the recruited CyKD probands and controls

The clinically validated arm of the 100KGP gave a molecular diagnosis to 53% of the CyKD cohort. Of these probands, 1,290 had a genetic outcome from the 100KGP clinical pipeline: 640 (52.93%) were solved, 34 (2.81%) were partially solved, 79 (6.54%) had missing data, and 537 (44.42%) were unsolved. The top 3 molecular diagnoses were PKD1-truncating (340 [26%]), PKD2-truncating (122 [9.5%]), and PKD1-nontruncating (118 [9.1%]) variants. The full breakdown of solved cases and the types of variants can be found in Supplemental Table 2 (3 patients were solved for primary conditions unrelated to their CyKD, e.g., intellectual disability, and were not included in this table and 12 cases did not have a gene recorded despite being listed as solved).

Outcome data show those with pathogenic PKD1 variants have the worst renal prognosis followed by those without a diagnosis. Of the 1,290 cases, 578 (44.8%) had data regarding kidney function in the form of HPO (PMID: 37953324) or Hospital Episode Statistics (HES) codes, of whom 398 (68.9%) had reached KF (full breakdown in Supplemental Table 1, B and C). Survival analysis of these cohorts followed what is known about the renal prognosis of variants in CyKD (Figure 2).

Outcome data show those with pathogenic PKD1 variants have the worst renalFigure 2

Outcome data show those with pathogenic PKD1 variants have the worst renal prognosis followed by those without a diagnosis. Kaplan-Meier graph of kidney failure (KF) by variant type. PKD1-T, PKD1-truncating variant; PKD1-NT, PKD1-nontruncating variant; PKD2-T, PKD2-truncating variant; Other, another variant in the PanelApp cystic kidney disease gene panel.

Combining the clinical and research pipelines identified a potential explaining genetic variant in the majority of CyKD cases. Of the 1,209 ancestry-matched CyKD cases, 994 (82%) had a potentially explaining monogenic or single SV identified when combining the clinical pipeline results with those who had variants identified via collapsing-gene-based analyses (Supplemental Table 3). These are discussed in greater detail below. Of note, the research cohort consists of a subset of cases with ancestry-matched controls (n = 1,209).

Unbiased rare variant analysis highlights IFT140 and COL4A3 as important genes involved in CyKD. Rare variant analysis of the ancestry-matched cohort of 1,209 cases and 26,096 ancestry-matched controls under the “missense+” mask showed a significant enrichment of cases for PKD1 (P = 1.17 × 10–309, odds ratio [OR] 10.60, 95% CI 9.35–12.01), PKD2 (P = 1.96 × 10–150, OR 19.07, 95% CI 15.13–23.99), DNAJB11 (P = 3.52 × 10–7, OR 1.07, 95% CI 0.95–1.21), and COL4A3 (P = 1.26 × 10–6, OR 3.02, 95% CI 2.10–4.22) (Figure 3A). There was no evidence of genomic inflation (λ < 1; see Q-Q plot in Supplemental Figure 3B).

Unbiased rare variant analysis highlights IFT140 and COL4A3 as important geFigure 3

Unbiased rare variant analysis highlights IFT140 and COL4A3 as important genes involved in CyKD. Gene-based Miami plots of the SAIGE-GENE “missense+” analyses. Each data point is a gene, made up of variants that qualified under their respective mask. (A) Missense+ analysis of the total ancestry-matched cohort of 1,209 cases and 26,096 controls showing a significant enrichment of cases for PKD1, PKD2, DNAJB11, and COL4A3. PKD1 and PKD2 are highlighted, as the plot is capped at an association of 1 × 10–30. Actual associations: PKD1 (P = 1.17 × 10–309), PKD2 (P = 1.96 × 10–150). (B) Missense+ analysis of the depleted cohort of 308 cases versus 26,096 controls showing enrichment of cases for IFT140 and COL4A3. The horizontal line indicates the threshold for exome-wide significance. Generalized logistic regression modeling, as implemented using optimal sequence kernel association testing (SKAT-O) via SAIGE-GENE, was used to generate these data.

Removing cases solved by 100KGP and patients that had a bioinformatically ascertained potentially disease-causing variant in a known cystic gene left 308 cases (at the time of analysis IFT140 was not a known CyKD gene). Repeating the rare variant analysis under the “missense+” mask in this group showed a significant enrichment of cases with variants in IFT140 (P = 1.26 × 10–16, OR 5.57, 95% CI 3.63–8.21) and COL4A3 (P = 6.83 × 10–7, OR 4.93, 95% CI 2.77–8.11) compared with 26,096 controls (Figure 3B).

Using REVEL, a different annotation method for missense variants in the total and unsolved CyKD cohort, led to largely similar results but with smaller association values. In the total cohort, PKD1 (P = 1.63 × 10–35, OR 22.41, 95% CI 14.57–34.49), PKD2 (P = 3.84 × 10–14, OR 21.87, 95% CI 10.47–45.71), and COL4A3 (P = 8.84 × 10–6, OR 3.42, 95% CI 2.04–5.47) remained significant with the loss of the DNAJB11 signal. Of note, PKHD1 (P = 8.12 × 10–5, OR 3.56, 95% CI 1.62–7.02) and COL4A4 (P = 7.74 × 10–4, OR 3.52, 95% CI 1.80–6.42) increased their significance. In the unsolved CyKD cohort, COL4A3 (P = 1.09 × 10–6, OR 6.50, 95% CI 3.01–12.50) remained significant (Supplemental Tables 6 and 8). Of note, other known genes causing CyKD such as GANAB and ALG8 were not found to be enriched at a population level in this cohort. Analysis of a combined unsolved CyKD and polycystic liver disease cohort (n = 359) did not reveal any additional associated genes (Supplemental Tables 10 and 14).

IFT140 and COL4A3 variants in unsolved individuals most likely represent their primary diagnosis. Out of the 308 unsolved cases, 27 (8.8%) had a qualifying variant in IFT140 under the “missense+” mask. Of the 27 cases, all were heterozygous for the qualifying variants. None of the variants individually reached genome-wide significance.

Analysis of SVs intersecting with IFT140 revealed 2 additional cases (0.65%) with heterozygous exon-crossing SVs from the 308 unsolved cystic disease cases: one patient had a 3.6 kb deletion spanning exon 16 and 17 and another patient had 2 different inversions (12.6 kb and 8.1 kb). Four exon-crossing SVs (3 deletions and 1 tandem duplication) were seen in 4 different controls (0.015%). There was enrichment of IFT140 SVs in cases versus controls (P = 0.0032), although this was not significant at the genome-wide level. None of the 27 initial cases with IFT140 SNVs had detectable CNVs affecting IFT140, compared with 3 CNVs seen in 26,096 controls (P = 1).

There were no plausible second variants within IFT140 or other known cystic kidney genes in any of these individuals. A full variant and phenotypic breakdown of the IFT140 cases can be found in Supplemental Table 4A.

Among the 15 unsolved CyKD patients with qualifying variants in COL4A3 under the “missense+” mask, all were heterozygous for their respective variants and did not overlap with the unsolved IFT140 cohort listed above. None of the variants individually reached genome-wide significance. There was no plausible second variant in known cystic kidney genes that could explain the phenotype. Of note, 4 of the COL4A3 patients had liver cysts. We reanalyzed these 4 patients, searching for known genetic causes of polycystic liver disease, but none were found. A full variant and phenotypic breakdown of the COL4A3 cases can be found in Supplemental Table 4B.

Analysis of loss-of-function (protein truncating) variants identifies monoallelic defects of PKHD1 in unsolved CyKD. Collapsing rare variants that had a high confidence call for loss-of-function under the “LoF” mask (i.e., analysis restricted to protein length-altering variants, excluding all missense variants) revealed significant enrichment of cases for PKD2 (P = 3.05 × 10–147, OR 130.85, 95% CI 83.66–215.37), PKD1 (P = 1.29 × 10–117, OR 36.01, 95% CI 30.52–42.23), IFT140 (P = 3.00 × 10–25, OR 14.03, 95% CI 7.91–24.45), DNAJB11 (P = 1.84 × 10–12, OR 1.07, 95% CI 0.95–1.21), and PKHD1 (P = 2.98 × 10–08, OR 4.07, 95%CI 2.24–6.88) (Figure 4).

Analysis of loss-of-function (protein truncating) variants identify monoallFigure 4

Analysis of loss-of-function (protein truncating) variants identify monoallelic defects of PKHD1 in unsolved CyKD. Gene-based Manhattan plot of the SAIGE-GENE “loss-of-function” analysis of the total ancestry-matched cohort of 1,209 cases and 26,096 controls showing a significant enrichment of cases for PKD1, PKD2, DNAJB11, IFT140, and PKHD1. Each data point is a gene, made up of variants that qualified under their respective mask. The red line indicates an exome-wide significance level of P = 2.5 × 10–6. Generalized logistic regression modeling, as implemented using optimal sequence kernel association testing (SKAT-O) via SAIGE-GENE, was used to generate these data.

Removing cases with qualifying variants in IFT140 and COL4A3 left 266 unsolved cases in whom rare variant testing did not reveal any additional significant associations (Supplemental Figure 3).

In all presented analyses, the patients were heterozygous for their qualifying variants, except in DNAJB11 where 59 of the 369 cases that had qualifying variants within the “missense+” mask were homozygous.

There were 61 predicted LoF variants in PKHD1 that made up the association signal in the LoF mask analysis of the whole CyKD cohort. These were seen in 50 cases, of which 22 were solved, 2 were partially solved, 24 were unsolved, and 2 were unascertainable. All 50 cases were heterozygous for the variant that made up the signal.

Of the 22 solved cases, 3 patients had a diagnosis of ARPKD secondary to biallelic PKHD1 variants, and 19 had a diagnosis of ADPKD due to variants in PKD1 or PKD2. In the 2 partially solved cases, both patients had a second PKHD1 variant deemed to be a variant of unknown significance (VUS).

Of the 24 unsolved cases with a single LoF PKHD1 variant, 4 also had a computationally predicted high-impact nontruncating variant in PKD1, and 1 (in addition to the PKHD1 variant) had a predicted high-impact nontruncating PKD2 variant.

In the remaining 18 cases with a single heterozygous PKHD1 LoF variant, there were no SNVs or SVs that would imply compound heterozygosity (and a diagnosis of ARPKD), or potentially pathogenic variants in any other gene associated with CyKD. Two patients had a second PKHD1 variant, with a combined annotation–dependent depletion (CADD) score of greater than 20 in PKHD1, but both had been deemed “likely benign” by Clinvar (Clinvar ID: 1187104 and 102305).

In total, 634 (2.4%) of the 26,096 controls carried qualifying monoallelic PKHD1 LoF variants. When compared with the 18 (6.7%) out of 266 unsolved cases with no clear molecular diagnosis, there is a significant enrichment of PKHD1 variants in the unexplained CyKD cohort (P = 5.85 × 10–6, OR 2.92, 95% CI 1.69–4.76). Three of the 18 monoallelic PKHD1 cases had reached KF at a median age of 42 years. There was no statistically significant difference between the rates of liver cysts between the monoallelic PKHD1 cohort and the general CyKD cohort (P = 0.31). The full demographic details of the PKHD1 cohort can be found in Supplemental Table 4C.

Noncoding collapsing analysis of the no-variant-detected cohort revealed enrichment of splice site variants in PKD1 and PKD2. Removing the cases with qualifying IFT140, COL4A3, and monoallelic PKHD1 variants led to no further enrichment in the no-variant-detected (NVD) cohort under the “missense+” or “LoF” gene-collapsing tests. However, in the remaining 248 cases versus 26,096 controls there was significant enrichment in acceptor gain (AG), acceptor loss (AL), and donor loss (DL) splice variants for PKD1 (AG P = 6.70 × 10–11, OR 150.57, 95% CI 35.39–730.24; AL P = 4.22 × 10–8, OR 398.51, 95% CI 39.10–16384; DL P = 6.32 × 10–6, OR = no variants in controls) and for DL in PKD2 (P = 5.97 × 10–10, OR = no variants in controls) (Figure 5).

Noncoding collapsing analysis of the no-variant-detected (NVD) cohort reveaFigure 5

Noncoding collapsing analysis of the no-variant-detected (NVD) cohort revealed enrichment of splice-site variants in PKD1 and PKD2. Gene-based Manhattan plot of SAIGE-GENE splicing analysis. Each data point is a gene representing the significance of the association with CyKD in 248 cases versus 26,096 ancestry-matched controls, made up of variants that are highly likely (SpliceAI score >0.8) to impact splicing. The horizontal line indicates the threshold for genome-wide significance. There was significant enrichment in acceptor gain (AG), acceptor loss (AL), and donor loss (DL) variants for PKD1 (AG P = 6.70 × 10–11, AL P = 4.22 × 10–8, DL P = 6.32 × 10–6), and for DL in PKD2 (P = 5.97 × 10–10). Generalized logistic regression modeling, as implemented using optimal sequence kernel association testing (SKAT-O) via SAIGE-GENE, was used to generate these data.

There was no enrichment in the 3′- or 5′-UTR, intronic regions with a CADD score of greater than 20, or donor gain splice sites on a genome-wide basis.

Rare variant analysis by primary variant does not reveal contribution of variants in other genes. Using the primary variant, the CyKD cohort was divided into cases with PKD1- and PKD2-truncating and -nontruncating variants, respectively. Excluding the primary gene in each cohort did not identify significant enrichment of any additional genes.

A full list of the summary statistics that includes the variants that make up each association can be found in Supplemental Tables 5–26.

SVs in PKD1, PKD2, and the 17q12 loci play an important role in CyKD. Exome-wide gene-based SV analysis was performed in all CyKD cases and ancestry-matched controls. Across all combined types of SV (CNVs, deletions, duplications, inversions) there was significant enrichment in PKD1 (P = 2.02 × 10–14, OR 2.52, 95% CI 1.69–3.63), PKD2 (P = 7.48 × 10–12, OR 3.51, 95% CI 1.74–6.37), and genes within the 17q12 locus, including HNF1B (P = 8.81 × 10–9, OR 7.11, 95% CI 3.41–13.66). Of note, 2 genes within proximity of PKD2 also reached genome-wide significance: SPARCL1 (P = 5.76 × 10–7) and HSD17B11 (P = 8.69 × 10–6), but these were made up of large CNVs that also encompassed PKD2 (Figure 6).

SVs in PKD1, PKD2, and the 17q12 loci play an important role in CyKD.Figure 6

SVs in PKD1, PKD2, and the 17q12 loci play an important role in CyKD. Gene-based Manhattan plot. Each data point is a gene representing the significance of the association with CyKD in 1,209 cases and 26,096 ancestry-matched controls, made up of rare (in-house MAF < 0.01), exon-crossing SV/CNVs that have been called by MANTA/CANVAS. Common SV/CNVs (MAF > 0.1%) seen in gnomAD or the 100KGP cancer cohort were excluded. The horizontal line indicates the threshold for genome-wide significance. The significant associations were PKD1 (P = 2.02 × 10–14), PKD2 (P = 7.48 × 10–12), and the 17q12 locus (P = 8.81 × 10–9). A 2-sided Fisher’s exact test was utilized to compare cases and controls under a dominant inheritance model.

The PKD1 signal was driven by small deletions of less than 10 kb (median size 1.14 kb, IQR 2.60) (P = 2.17 × 10–22, OR 8.11, 95% CI 4.58–13.83). For PKD2 (P = 7.48 × 10–12, OR 13.03, 95% CI 5.02–31.87) and the 17q12 locus (P = 4.12 × 10–8, OR 8.70, 95% CI 3.72–18.80), the signal was driven by deletions of greater than 10 kb (median size in PKD2 405 kb, IQR 1273 kb; 17q12 1550 kb, IQR 94 kb), with no other loci reaching genome-wide significance. No genes reached genome-wide significance for duplications.

Of the 46 patients with rare exon-crossing SVs in PKD1 or PKD2, 13 also harbored predicted LoF variants in PKD1 or PKD2, thus leaving 33 patients with CyKD attributable to SVs in PKD1 or PKD2.

Of the 11 patients with 17q12 loci CNVs in the CyKD cohort, 1 patient had a PKD1-nontruncating SNV and 2 had PKD1-truncating SNVs that met the criteria for being likely disease causing. One patient had a known HNF1B CNV detected by a separate diagnostic lab prior to the return of 100KGP results.

Analyzing the subgroup of patients without an identified molecular diagnosis (n = 248), there was significant enrichment for large (>10 kb) deletions at the 17q12 loci (P = 9.21 × 10–9, OR 24.04, 95% CI 8.00–60.71) that were detected in 7 probands.

Of the seven 17q12 patients, the median age was 13.5 years, significantly lower than the total cystic disease cohort (P < 0.05). None of the patients had reached KF or had HPO or HES codes pertaining to diabetes; a full breakdown of phenotypic profile can be found in Supplemental Table 27 and all summary statistics from the SV analysis can be found in Supplemental Tables 28 and 29.

More recently described genes in CyKD are less penetrant than PKD1 and PKD2. There was marked variation in the proportion of individuals with each gene/variant type that were documented to have CyKD, with the figures broadly comparable between 100KGP and the UK Biobank (UKBB) (Table 2).

Table 2

OR of developing CyKD in the 100KGP (n = 741) and the UKBB (n = 825) in each different gene

SeqGWAS of CyKD reveals no robust common variant associations. A sequencing-based GWAS (seqGWAS) of 1,209 CyKD cases and 26,096 ancestry-matched controls using 10,377,275 variants with a minor allele frequency (MAF) of greater than 0.1% (Supplemental Figure 4) revealed only a single variant reaching genome-wide statistical significance on chromosome 8, chr8:92259567:A:C (P = 1.38 × 10–8, OR 0.72, MAF 0.23). There was no evidence of genomic inflation (λ = 0.99). To confirm/refute this surprising finding, we meta-analyzed this data set with those from the UKBB, Japanese Biobank (JBB), and FinnGen biobank. In the non-100KGP data sets there was evidence of association at several loci, most notably a stop gain in PKHD1 in the FinnGen cohort (see Supplemental Figures 5 and 6 for individual study Manhattan plots), but the chr8:92259567:A:C signal was not replicated and likely to be a false positive. Overall, in the combined analysis of 2,923 CyKD cases and 900,824 controls across 6,641,351 variants there were no genome-wide significant associations (Supplemental Figure 7).

Subgroup analysis by primary disease–causing variant type did not reveal any genome-wide significant loci (see Supplemental Figures 8 and 9).

The proportion of heritability attributable to common variants was between 3% and 9%. Within the 100KGP CyKD cohort, the proportion of phenotypic variance (h2) explained by additive common and low-frequency variation among individuals of European ancestry was 9.0% (SEM 7.6%). Using the summary statistics from the combined FinnGen/UKBB CyKD GWAS, the estimated heritability was 3.0% (SEM 9.7%). The large SEMs reflect low power to detect heritability within this cohort.

TTE analysis did not reveal any trans-acting genetic modifiers of severity. Within the pre-ancestry–matched CyKD cohort, 398 of the 1,288 probands had reached KF (30.9%), with a median age of 52 years (IQR 16). TTE genetic association analysis did not reveal any genome-wide significant associations — either in the total cohort or stratified by primary gene or variant type (see Supplemental Figure 10).

留言 (0)

沒有登入
gif