Whole genome sequencing–based copy number variations reveal novel pathways and targets in Alzheimer's disease

1 BACKGROUND

Alzheimer's disease (AD) is a neurodegenerative disease affecting more than 50 million people worldwide.1 By 2050, the worldwide frequency of AD is predicted to be 1 in 85 individuals older than 65.2 The heritability of early-onset familial AD (EOAD) is estimated to be as high as 79%, based on a large twin study.3 However, relative to late-onset “sporadic” AD, EOAD is relatively rare (≤5%).4, 5 Three genes (i.e., APP, PSEN1, and PSEN2) have been found to have autosomal dominant mutations fully penetrant for EOAD. The genetic mechanisms underlying AD, especially late-onset AD (LOAD), remain largely unclear, although previous genome-wide association studies (GWAS) have identified 42 risk factor loci for LOAD.6-8 Compared to single nucleotide polymorphisms (SNPs; ≈1%),9 copy number variations (CNVs) affect a much larger fraction of the genome (≈10%).9-11 In normal diploid human cells, genomic regions comprise two homologous parental sequence copies. However, there are long fragment deletions or duplications in some genomic regions, which deviate from the typical sequence copy number of two for the region. Such deletions and tandem duplications are collectively known as CNVs,12 ranging in length from 50 bp to several Mb.12, 13 CNVs play a significant role in many neurological diseases such as Parkinson's disease (PD),14, 15 schizophrenia,16 mental retardation,2 and AD.2, 17-20 However, these studies have been limited to array comparative genomic hybridization (a-CGH), targeted polymerase chain reaction (PCR), or SNP array approaches.2 PCR-based methods, such as quantitative PCR(qPCR), only cover a limited number of targeted regions. Even though a-CGH analysis can cover the entire genome, its resolution is low (≈5–10 kb).2 SNP array-based analysis can cover the entire genome but is underpowered to detect rare CNVs10 and also has a low resolution for pinpointing breakpoints (≈2–10 kb).2

In this study, we first comprehensively identified CNVs from the paired-end short read (2 × 150 bp)–based whole-genome sequencing (WGS) data generated from post mortem brain tissues of 1411 North American White individuals across two cohorts from the Accelerating Medicines Partnership–Alzheimer's Disease (AMP-AD) consortium,21 including the Mount Sinai/JJ Peters VA Medical Center Brain Bank (MSBB) AD cohort,22 and the Religious Orders Study/Memory and Aging Project (ROSMAP) cohort23 using four complementary CNV calling approaches (i.e., CNVnator,24 Pindel,25 MetaSV,26 and Delly227). Within each cohort, individual-level calling results from the four approaches were integrated into a set of population-level CNVs. Furthermore, only consensus CNVs detected by three or more approaches in each cohort were used for afterward analysis to exclude software bias. Comparing 701 LOAD cases with 710 non-AD cases, we identified 3,012 rare AD-specific CNVs genome-wide. The AD-specific CNVs were only observed in AD cases. Sixty-four AD-specific CNVs were conserved across two cohorts. The AD-specific CNVs are enriched in transcriptional regions for biological processes such as cellular glucuronidation, neuron projection, and multicellular organismal signaling, a novel finding not found in AD GWAS. By further integrating clinical, pathophysiological, and transcriptomic data, we found that common CNVs affect the transcription levels of genes involved in major histocompatibility complex (MHC) class II receptor activity across different brain regions, supporting previous reports of the increased immune response in AD.28 Three CNVs (i.e., mCNV233, mCNV236, and mCNV11665) are significantly negatively correlated with the Braak score in the dorsolateral prefrontal cortex (DLPFC) region. CNV-Gene-Trait correlation networks integrating matched multi-omics and clinicopathological data first pinpoint one novel CNV, a key regulator for immune response (DEL6619.MSBB/mCNV21544.ROSMAP), and further provide many novel gene targets that connect CNVs with clinical and pathological traits of AD. All consensus CNVs of the two cohorts have been uploaded to the AD Knowledge Portal (http://doi.org/10.7303/syn26254632), and the University of California Santa Cruz (UCSC) genome browser track (http://genome.ucsc.edu/s/c6ming2/AMPAD.CNVs).

Identification of AD-specific CNVs provides a new perspective of AD's genetic risk factors. Moreover, the association of CNVs with matched clinical, pathological, and transcriptomic data sheds light on disease mechanisms. To our knowledge, this is the first genomic CNV study of LOAD by integrating WGS data with clinical, pathologic, transcriptomic data. The AD-associated CNVs and the underlying gene targets deepen our understanding of the genetic mechanisms underlying AD.

2 RESULTS 2.1 Identification of consensus CNVs in the AMP-AD cohorts

After excluding the duplications, contaminated samples, and outliers, the MSBB22 and ROSMAP23 cohorts contain 341 and 1129 samples, respectively. To exclude bias from demographic history, we focused on North American White samples in the analysis. There were 1411 samples left in total (MSBB: 284 samples, ROSMAP: 1127 samples; Methods 3.1, Tables S1-S3 in supporting information). By integrating results from four different and complementary CNV calling approaches (CNVnator,24 Pindel,25 Delly2,27 and MetaSV26), we generated a set of CNVs for each cohort (Figure 1 and Figure 2, Table 1, Figure S1 in supporting information, Tables S4-S6 in supporting information, and Methods 3.4). The robustness of these CNVs was further evaluated by the consensus among the four CNV calling approaches (Table S4). Consensus Class I includes the CNVs identified by only one calling method, and Consensus Class II consists of the CNVs determined by only two methods, while Consensus Class III contains the CNVs identified by three or more methods. We focused on the CNVs in the Consensus Class III in the subsequent analyses to exclude method bias. The Consensus Class III includes 7150 and 9902 CNVs in the MSBB and ROSMAP cohorts, respectively (Table 1, Figure 2A, and Tables S4-S6). Two CNVs with a reciprocal overlap (RO) of 50% or greater in their genomic locations are considered to have significant overlap and are treated as the same CNV. The median individual CNV counts of the two cohorts are similar (i.e., 987 CNVs per individual in the MSBB, and 1052 in the ROSMAP cohort). The two cohorts share 3687 CNVs based on the RO threshold of 50% (Figure 2B and Table S7 in supporting information). To estimate our CNV calling pipeline's replication rate, we randomly picked four samples (i.e., three AD cases and one NL control) from the MSBB cohort, sequenced the corresponding genomes twice, and compared the CNV calling results from two batches. Our CNV calling pipeline's replication rate ranged from 97.30% to 98.63% (Table S8 in supporting information). We further compared our consensus CNV sets to four public CNV datasets based on large populations (i.e., Decipher,29 DGV,30 the 1000 Genome project,31 and GnomAD32). More than half of our CNVs were validated in the four public CNV datasets (Table S9 in supporting information). The overlaps between our consensus CNV sets and these public CNV datasets were generally greater than the overlaps between the public datasets. For example, the overlaps of the MSBB and ROSMAP CNV datasets and the GnomAD CNV dataset were approximately 74% and 59%, respectively, whereas the overlaps between GnomAD and DECIPHER, 1KGP, and DGV were approximately 39%, 51%, and 31%, respectively. The consensus CNV sets of the two cohorts were uploaded to the UCSC genome browser track, which can be viewed through the link: http://genome.ucsc.edu/s/c6ming2/AMPAD.CNVs. The full CNV matrixes of the two cohorts and scripts used to generate CNVs can be downloaded from the AD Knowledge Portal (http://doi.org/10.7303/syn26254632).

HIGHLIGHTS We systematically identified 3012 rare Alzheimer's disease (AD)-specific copy number variations (CNVs) based on the whole genome sequencing data from 1411 individuals in two cohorts. AD-specific CNVs have distinct molecular functions compared to the normal control-specific CNVs. CNV-correlated gene expressions are involved in major histocompatibility complex class II receptor activity and interferon-gamma mediated signaling. CNV-correlated gene networks pinpoint a novel CNV as a key regulator for the immune response pathway in AD. RESEARCH IN CONTEXT

Systematic review: A few rare copy number variations (CNVs) have been implicated in Alzheimer's disease (AD), but there is no systematic study of CNVs in AD based on whole genome sequencing (WGS) data.

Interpretation: We analyzed the WGS data of 1411 North American White individuals from two AD cohorts and identified 3012 rare AD-specific CNVs. Rare AD-specific CNVs were involved in cellular glucuronidation and neuron projection. We further revealed the functional contexts of the identified CNVs by integration with matched transcriptomic, clinical, and pathological data.

Future directions: The functional impact of the above-identified AD-specific CNVs and common CNV-correlated RNAs need to be experimentally validated in future studies. Another important direction to pursue is whether somatic CNV mutation rates differ across brain regions or at disease states.

image

Genomic copy number variation (CNV) distribution in the two cohorts (MSBB and ROSMAP). Track 0: Human genome cytoband. Track 1: Deletions in ROSMAP. Track 2: Duplications in the ROSMAP. Track 3: multi-allelic CNVs in ROSMAP. Track 4: Alzheimer's disease (AD)-specific CNVs in the ROSMAP. Track 5: Deletions in MSBB. Track 6: Duplications in MSBB. Track 7: multi-allelic CNVs in MSBB. Track 8: AD-specific CNVs in MSBB. Orange and blue lines represent deletion and duplication, respectively. Green lines represent multi-allelic CNVs

image Overall features of the copy number variations (CNVs) identified in MSBB and ROSMAP, including composition of CNV types, site frequency spectrum (SFS). (A) Pie chart of the CNV composition in each cohort. The exact numbers can be found in Table 1. (B) CNV sharing pattern across the two cohorts. The exact numbers can be found in Table S7. The CNV proportion in each category is based on the boundary of each cohort separately. The overlapping criteria is defined as the reciprocal overlap ratio larger than 0.5. (C) SFS of deletions and duplications in the MSBB and ROSMAP cohorts TABLE 1. Summary of detected consensus autosomal CNVs from MSBB and ROSMAP CNV type Calling Quality MSBB ROSMAP Bi-allelic deletions Consensus class III 4627 3915 Bi-allelic duplications Consensus class III 724 949 Multi-allelic CNVs Consensus class III 1799 5038 Total CNVs 7150 9902 Note: Consensus class is defined by the supported software number. The consensus class III represents CNVs detected by three or more software programs. The numbers of Consensus class I and consensus Class II are reported in Table S4. Abbreviations: CNV, copy number variation; MSBB, JJ Peters VA Medical Center Brain Bank; ROSMAP, Religious Orders Study/Memory and Aging Project. 2.2 Distinct molecular functions of AD- and mild cognitive impairment–specific CNVs

We further categorized all samples of the MSBB and ROSMAP cohorts into three clinical diagnostic groups (i.e., the AD group, the mild cognitive impairment (MCI) group, and the normal control (NL) group) based on the disease severity measurement Clinical Dementia Rating (CDR). In the MSBB cohort,22 there are 224 AD samples with CDR > 0.5, 27 MCI samples with CDR = 0.5, and 33 NL samples without cognitive impairment (CDR = 0). The ROSMAP cohort23 includes 477 AD samples, 285 MCI samples, and 365 NL samples. In total, there are 701 LOAD, 312 MCI, and 398 NL samples (Tables S1-S2). In the subsequent analyses, we focused on studying the effect of CNVs in the clinical diagnostic AD group.

Each CNV was assigned to a clinical diagnostic group to which the respective sample belonged (Figure 3A). Group-specific CNVs are defined as CNVs that are only observed in one specific group but not in any other group (Figure 3B). For example, the AD-specific CNVs are CNVs only observed in the AD cases in the two cohorts under study but not in the NL and MCI cases. If the frequency of a CNV in the AD group is greater than 0 and its frequency in the non-AD groups (i.e., the MCI and NL groups) is zero, this CNV is called an AD-specific CNV. Similarly, the MCI-specific CNVs are only observed in the MCI cases, while the NL-specific CNVs are only observed in the NL cases (Figure 3B). By excluding the CNVs detected in any of the 710 non-AD cases (i.e., the 312 MCI cases and 398 NL), we identified 3012 unique AD-specific CNVs in the 701 AD cases from the MSBB and ROSMAP cohorts (MSBB: 2185, ROSMAP: 891; Figure 3C, Table S10 in supporting information). Among these AD-specific CNVs, 64 were conserved in the two cohorts (Figure 3C, Figure 4E, and Table S10). The AD-specific CNVs were observed at low population frequencies (≤6.25% in MSBB, ≤1.26% in ROSMAP, Figure 3D). There was no significant difference in the total CNV length or the total CNV count per individual between the AD, MCI, and NL groups in MSBB or ROSMAP, based on the Quasi-Poisson regression model (QPRM; Methods 3.5, and Table S11 in supporting information). In MSBB, the mean number (17.19) of the AD-specific CNVs per AD case is significantly higher than that (6.7) of the MCI-specific CNVs per MCI case (QPRM Padj =5∗E−2) and that (6.64) of the NL-specific CNVs per NL case (QPRM Padj =2.67∗E−2; Table S12 and Figure S2 in supporting information). A similar trend was observed in ROSMAP. In QPRM, the clinical diagnostic group is the main predictor variable, the response variable is “the total CNV count” or “the total CNV length” or “the group-specific CNV count” per individual, while sex and age of death are co-variants (Methods 3.5).

image

Comparison of the copy number variation (CNV) sets in three clinical diagnostic groups (normal (NL), mild cognitive impairment (MCI), and AD) in MSBB and ROSMAP. (A) Intersection of the CNV sets in three different diagnostic groups in each cohort. The numbers are defined by comparing different diagnostic groups in the same cohort. (B) Illustration of the concept of group-specific CNVs. The pink, orange, and green shadow regions represent the AD-specific, MCI-specific, and NL-specific CNV sets. All the samples in the two cohorts are considered here. (C) Intersection of the diagnostic group-specific CNV sets in MSBB and ROSMAP. The numbers are based on the cross-cohort comparison. (D) Site frequency spectrum of AD-specific deletions and duplications. DEL and DUP represent deletion and duplication, respectively

image

Functional analysis of Alzheimer's disease (AD)-, mild cognitive impairment (MCI)-, and normal (NL)-specific copy number variation (CNV) genes. CNV genes are the genes whose genomic locations overlap with a given CNV. (A) AD-specific CNV genes are enriched for cellular glucuronidation, neuron projection, uronic acid metabolic process, extrinsic component of plasma membrane, synapse, catenin complex, and multicellular organismal signaling. (B) Genes whose genomic locations overlap with multiple AD-specific CNVs are enriched for neuron development, neuron recognition, neuron differentiation, cell projection organization, neurogenesis, axon, and neuron projection. (C) MCI-specific CNV genes are enriched for ligase activity forming carbon-sulfur bonds. (D) NL-specific CNV genes are enriched for immunoglobulin complex. (E) Circos plot of the 64 conserved AD-specific CNVs in JJ Peters VA Medical Center Brain Bank and Religious Orders Study/Memory and Aging Project. The outer track 1 represents the genomic locations of the 64 conserved AD-specific CNVs, while the outer track 2 represents the genes whose genomic locations overlap these 64 CNVs. The inner track 1 represents the genomic location of the APP duplication region. (F) The 29 AD-specific CNVs encompassing the APP duplication region illustrated in the University of California Santa Cruz genome browser track. The light blue shade represents the location of the APP gene

image

Functional analysis of Alzheimer's disease (AD)-, mild cognitive impairment (MCI)-, and normal (NL)-specific copy number variation (CNV) genes. CNV genes are the genes whose genomic locations overlap with a given CNV. (A) AD-specific CNV genes are enriched for cellular glucuronidation, neuron projection, uronic acid metabolic process, extrinsic component of plasma membrane, synapse, catenin complex, and multicellular organismal signaling. (B) Genes whose genomic locations overlap with multiple AD-specific CNVs are enriched for neuron development, neuron recognition, neuron differentiation, cell projection organization, neurogenesis, axon, and neuron projection. (C) MCI-specific CNV genes are enriched for ligase activity forming carbon-sulfur bonds. (D) NL-specific CNV genes are enriched for immunoglobulin complex. (E) Circos plot of the 64 conserved AD-specific CNVs in JJ Peters VA Medical Center Brain Bank and Religious Orders Study/Memory and Aging Project. The outer track 1 represents the genomic locations of the 64 conserved AD-specific CNVs, while the outer track 2 represents the genes whose genomic locations overlap these 64 CNVs. The inner track 1 represents the genomic location of the APP duplication region. (F) The 29 AD-specific CNVs encompassing the APP duplication region illustrated in the University of California Santa Cruz genome browser track. The light blue shade represents the location of the APP gene

One of the 64 AD-specific CNVs conserved across the two cohorts resides within the duplication region encompassing the APP gene (chr21:14,714,507-29,216,662: nsv1398044; Figure 4E). The other 63 conserved AD-specific CNVs have not been associated with AD and thus are novel (Figure 4E, and Table S13 in supporting information). Interestingly, the majority of these conserved AD-specific CNVs (61 out of 64) are reported in other published CNV datasets, which are based on large populations without mental or neuropathological trait records (i.e., Decipher,29 DGV,30 the 1000 Genome project,31 and GnomAD32; Table S13). Their frequency is much higher in the AD group than the general population with European ancestry based on the GnomAD database (Table S13).

Genes whose transcriptional regions reside in the genomic regions of AD-specific CNVs are defined as AD-CNV genes in the subsequent analyses (Figure S3 in supporting information). The AD-CNV genes are significantly enriched for important biological processes such as cellular glucuronidation, neuron projection, uronic acid metabolic process, extrinsic component of plasma membrane, synapse, catenin complex, and multicellular organismal signaling (Figure 4A, Table 2, Figure S3, and Table S14 in supporting information). Furthermore, the genes overlapping with multiple AD-specific CNVs are enriched in many neuron-related pathways such as neuron development, neuron recognition, neuron differentiation, cell projection organization, neurogenesis, axon, and neuron projection (Figure 4B, Table 2, Figure S3, and Table S14). The genes residing in the genomic regions of the MCI-specific CNVs (termed MCI-CNV genes) are associated with ligase activity forming carbon-sulfur bonds (Figure 4C, Table 2, Table S14). In contrast, the genes residing in the genomic regions of the NL-specific CNVs (termed NL-CNV genes) are enriched for immunoglobulin complex (Figure 4D, Table 2, Table S14). These results reveal distinct molecular functions of AD- and MCI-specific CNVs compared to the NL-specific ones.

TABLE 2. Pathways enriched in the group-specific CNV genes Group GO term FET_Pa Padj Fold enrichmentb AD-CNV genes Plasma Membrane Region 1.85E-09 5.80E-05 1.67 Flavonoid Glucuronidation 3.13E-09 9.80E-05 13.39 Cellular Glucuronidation 1.02E-08 3.20E-04 8.29 Xenobiotic Glucuronidation 5.08E-08 1.60E-03 10.96 Neuron Projection 8.00E-08 2.50E-03 1.56 Uronic Acid Metabolic Process 1.98E-07 6.20E-03 6.63 Extrinsic Component of Plasma Membrane 2.12E-07 6.60E-03 2.76 Synapse 3.78E-07 1.20E-02 1.54 Catenin Complex 4.51E-07 1.40E-02 5.65 Multicellular Organismal Signaling 5.27E-07 1.60E-02 2.52 Glucuronosyltransferase Activity 9.67E-07 3.00E-02 5.32 Synaptic Membrane 1.09E-06 3.40E-02 2.05 Flavonoid Metabolic Process 1.56E-06 4.90E-02 8.04 Genes overlapping with multiple AD-specific CNVs Neuron Development 5.06E-08 5.30E-04 3.17 Neuron Recognition 7.36E-08 7.70E-04 19.09 Neuron Differentiation 8.38E-08 8.70E-04 2.87 Cell Projection Organization 1.98E-07 2.10E-03 2.64 Neurogenesis 3.85E-07 4.00E-03 2.57 Axon 5.18E-07 5.40E-03 3.85 Neuron Projection 1.75E-06 1.80E-02 2.66 Regulation Of Neuron Projection Development 1.77E-06 1.80E-02 4.40 Cell Part Morphogenesis 2.05E-06 2.10E-02 3.51 Cell Morphogenesis 4.76E-06 5.00E-02 2.87 MCI-CNV genes Ligase Activity Forming Carbon Sulfur Bonds 1.31E-06 4.10E-02 12.40 NL-CNV genes Immunoglobulin Complex 4.14E-15 1.30E-10 8.45 Abbreviations: AD. Alzheimer's disease; CNV, copy number variation; GO, Gene Ontology; MCI, mild cognitive impairment. 2.3 Replication of previously identified AD-associated CNVs

Two AD-specific CNVs were reported in previous studies33 (Table S15 in supporting information), and 29 AD-specific CNVs were found to be within the duplication region encompassing the APP gene (chr21:14,714,507-29,216,662:nsv1398044)17, 18, 34 (Figure 4F, Table S16 in supporting information).

Previous studies2, 29, 33, 35-43 have identified 31 CNVs possibly associated with AD (Table S15). Among these 31 CNVs, 20 are from AD cases and 2 shared by AD and MCI cases, while the rest showed differences in frequency between the AD and NL groups based on GWAS studies (Table S15). Two of the 20 known AD-specific CNVs significantly overlap our AD-specific CNVs, under an RO threshold ≥ 50% (Table S15). For the two overlapping AD-specific CNVs, EVC2/EVC/CRMP1-DUP is replicated in our study (DUP14974.ROSMAP), but KANK1/DMRT1-DEL, a previously identified deletion, is duplicated in our study (DUP28866.ROSMAP). A previously identified AD-specific CNV, HAS1/FPR1/FPR2/FPR3-DUP was detected not only in the AD cases but also in the MCI and NL cases in our study. DOPEY2-DUP, one of the two previously identified CNVs shared by AD and MCI, was also observed in AD, MCI, and NL cases in our current study. Moreover, HAS1/FPR1/FPR2/FPR3-DUP and DOPEY2-DUP were observed in healthy controls curated in the DGV database,30 suggesting that they are not AD/MCI-specific (Table S15). In summary, several AD-associated CNVs were replicated in our study.

Duplication of APP has been identified as a causal factor for early-onset familial Alzheimer's disease (FAD).17, 18 We found 29 AD-specific CNVs within the APP duplication region (chr21:14,714,507-29,216,662: nsv1398044; Figure 4F and Table S16). Among the 29 AD-specific CNVs within this APP duplication region (Figure 4F), one is conserved in two cohorts (chr21:25,258,373-25,263,454, DEL18858.MSBB/DEL47421.ROSMAP; Figure 4E; Table S13).

2.4 Distinct impact of CNVs on gene transcription in AD and MCI compared to NL

To further interrogate the effect of CNVs on transcription in AD brains, we performed Kendall's τ b correlation analysis44 of all CNVs and the transcriptomic data of the AD group from five different brain regions in the MSBB and ROSMAP cohorts (MSBB: the frontal pole [BM-10], the superior temporal gyrus [BM-22), the parahippocampal gyrus [BM-36], and the inferior frontal gyrus [BM-44]; ROSMAP: the DLPFC; Methods 3.6; the demographic information of samples can be found in Table S2). At a false discovery rate (FDR) of 5%, CNVs were significantly correlated with 95, 80, 66, and 79 genes in the BM-10 (abs(τ)∈[0.332,0.711]),BM−22(abs(τ)∈[0.338,0.697]), BM-36 (abs(τ)∈[0.367,0.735]), and BM-44 (abs(τ)∈[0.363,0.698]) regions, respectively (Table 3 and Table S17 in supporting information). In the ROSMAP cohort, 104 genes in the DLPFC were significantly correlated with 136 CNVs (abs(τ)∈[0.234,0.670]) (Table 3 and Table S17). Above gene-correlated CNVs are common CNVs with population frequency higher than 3%.

TABLE 3. Summary of the CNV-gene pairs with significant correlation in AD cases in five different brain regions Brain region Sample size of the AD group

留言 (0)

沒有登入
gif