Genomic analysis of 116 autism families strengthens known risk genes and highlights promising candidates

Overview of the cohort

Here we report the genomic characterisation of 435 individuals from 116 ASD families, comprising 144 individuals with ASD, 6 siblings with specific learning disabilities (SLD), 55 unaffected siblings, and 230 parents. Among the 144 affected individuals, 89 were from simplex families (SPX), 51 belonged to 25 multiplex families (MPX) and 4 to two monozygotic twin pairs. DNA samples were available for both parents for 114/116 families. Among the multiplex families, 22 included two affected siblings, one family included three affected siblings, while in two families the affected individuals were a child and a paternal uncle. The ASD individuals consisted of 110 males and 34 females, with a 3.2:1 male-to-female ratio.

Phenotypic data of ASD individuals, stratified by sex and family type (SPX/MPX), are reported in Supplementary Table 1. The mean age of symptoms onset was 15-16 months, with the majority of individuals (79%) presenting an early onset, and the mean age of diagnosis was 40 months (41.2 for males and 36.6 for females, two-sided t-test p value = 0.38) (Supplementary Fig. 1).

The mean ADOS-2 comparison score was 7.8 and the mean CARS2-ST was 39, with a significant difference between simplex and multiplex probands (two-sided t-test p-value ADOS-2 = 0.005, CARS2-ST = 0.001) (Supplementary Table 1 and Supplementary Fig. 2). The same trend was observed in the ICD-10 clinical diagnosis, where milder categories (F84.5 and F84.9) were more frequent among MPX probands (Fisher Exact test p value = 4.0 × 10-4).

Mild to severe ID was present in 56% of probands, with 16/144 cases with severe ID, without a significant difference in cognitive levels between male and female probands and SPX and MPX. The vast majority of probands had language problems (98.6%) and the rate of females with absent speech was significantly higher than in males (16/34 vs 30/110, chi2 p value = 0.03). Only 12 probands presented epilepsy (8%). EEG and MRI anomalies were identified in 30/126 and 36/124 individuals, respectively, with no significant differences among groups (Supplementary Table 1). Analysis of MRI data of this cohort has been previously described12.

The score distribution of Social and Communication Disorders Checklist (SCDC)13 in the entire cohort and of The Broad Autism Phenotype Questionnaire (BAPQ)14 in parents are shown in Supplementary Fig. 3. There are no significant differences in BAPQ scores between SPX and MPX or between parents of male-only or female-containing families (Supplementary Table 2a); 7 mothers (6 SPX and 1 MPX) and 4 fathers (6 SPX and 1 MPX) were above the threshold. Similarly, the mean SCDC values of ASD individuals, parents and unaffected siblings do not differ between sex or family type (Supplementary Table 2b).

Genetic data consisted of Illumina Infinium PsychArray genotyping for all families, WGS of 105 families and WES of 29 families (Supplementary Fig. 4).

MDS analysis was performed for ancestry determination, anchoring our cohort data to the 1000 Genomes Project. We visually inspected the first two MDS coordinates and found no discrepancy between genotype-computed and self-reported ancestry. Individuals of non-European ancestry comprise ∼15% of our sample (66 individuals from 19 families, for a total of 26 cases), including one African, one South Asian and 17 Admixed families (Supplementary Fig. 5).

Rare coding sequence variant analysis

We analysed WES and WGS data from all 435 individuals of our cohort, focussing on rare variants affecting coding exons and canonical splice sites as these provide the most direct links between gene function and disease pathogenesis. We did not use WGS data to investigate mitochondrial DNA, as deep sequencing of the entire mitogenome and quantification of mtDNA cellular content of this cohort has been previously described15.

We identified a total of 243 rare DNVs in protein-coding exons (MAF ≤ 0.1% in reference databases): 178 in 144 ASD individuals and 65 in 55 unaffected siblings (Fig. 1a).

Fig. 1: Rare de novo coding variants in cases and unaffected siblings.figure 1

a Rare coding de novo variants per individual in our cohort (ASD cases=144, unaffected siblings=55). b Distribution of rare de novo coding variants in cases and unaffected siblings: the pie charts represent rare de novo coding variants split by predicted functional consequences, represented by different colours. PTVs and missense variants are divided into two and three tiers of predicted functional severity, represented by different shade, based on the LOEUF (<0.6, ≥0.6) and MPC metrics (MPC ≥ 2 (DmisB), 1 ≤ MPC < 2 (DmisA), 0 ≤ MPC < 1), respectively.

The number of DNVs per child was consistent with the rate reported in other studies4 and similar between individuals with autism and their siblings (mean rate of 1.24 and 1.18, respectively). The percentage of cases and unaffected siblings carrying at least one rare de novo SNV (cases: 103/144, 71.5%; unaffected siblings: 35/55, 63%; chi2 p value = 0.28) was also comparable to previous studies.

We catalogued rare DNVs in six bins of predicted functional severity: two bins for PTVs according to the LOEUF score16, three bins for missense variants based on the MPC score17, and a single bin for synonymous variants. The variants predicted to be more deleterious account for 24% of the DNVs found in cases: 5.1% were PTVs in constrained genes (LOEUF < 0.6), hereafter referred to as “PTVLOEUF”, and 18.2% were damaging missense variants with MPC ≥ 1 (Dmis), among which 6.9% with MPC ≥ 2 (DmisB) and 11.3% with 1 ≤ MPC < 2 (DmisA). The remaining DNVs were missense with MPC < 1 (43%), synonymous (26%), and PTVs in unconstrained genes (7%), consistently with the previously reported bin distribution in a family sample of 6,430 ASD cases4. Interestingly, no PTVLOEUF were identified in unaffected siblings, supporting a larger effect on the liability of this class of variants (Fig. 1b, Supplementary Table 3).

Among the 37 rare de novo PTVLOEUF and Dmis identified in our cases, hereafter defined as potentially damaging SNVs (pdSNVs), two (one PTV and one DmisB) occurred in BRSK2 in two different families (Table 1). Beyond these de novo pdSNVs, we also identified a stop-gain variant of unknown origin in SHANK3 in the female proband of simplex family 123 (maternal DNA was unavailable). However, since LoF variants in SHANK3 usually arise de novo8, this was deemed as likely de novo (Table 1). To further assess the potential relevance of these 38 pdSNVs, we used the LOFTEE16, pext18 and AlphaMissense19 annotations (Supplementary Table 4). All 9 PTVs were high-confidence LoF variants according to LOFTEE and occurred in brain-expressed exons. Specifically, 7 involved bases constitutively expressed in the brain (pext score >0.9), while two (the frameshift variants in URB5 and BRSK2) fell in exons with an intermediate expression in the brain (0.46 and 0.69, respectively). However, the exon containing the BRSK2 variant showed a much higher expression in the brain compared to the mean aggregate expression, including all tissues (0.69 vs 0.1). Among the 11 DmisB, all but one (MGAT3 p.T468M) were predicted “likely_pathogenic” by AlphaMissense (LPαM) and all involved brain-expressed exons. Among the 18 DmisA, 9 were LPαM and all occurred in brain-expressed exons.

Table 1 List of de novo pdSNVs identified in affected individuals

STRING enrichment analysis of the 36 genes hosting the 38 de novo pdSNVs in probands detected a significant enrichment in gene interactions (12 vs 5 expected edges, 2.4-fold enrichment, p value = 0.00318, one-tailed hypergeometric test), whereas no significant interaction enrichment was identified for 40 genes hosting 42 synonymous de novo variants in probands (4 vs 3 expected edges, p-value = 0.456), nor for 12 genes hosting the 13 de novo pdSNVs (4 DmisB and 9 DmisA) identified in unaffected siblings (0 vs 0 expected edges, p value = 1).

When we restricted the STRING analysis to the 17 genes hosting the 19 most severe de novo pdSNVs (9 high-confidence PTVLOEUF and 10 DmisB LPαM), there was still a significant interaction enrichment (4 vs 1 expected edges, p value = 0.0389). Gene Ontology (GO) enrichment analysis of these 17 genes identified 10 genes in the “regulation of transport” category (GO:0051049, 6.48-fold enrichment, FDR = 3.88 × 10-3), and in the “regulation of localization” category (GO:0032879, 5.7-fold enrichment, FDR = 1.07 × 10-2), while no biological process resulted to be enriched for the 40 genes with de novo synonymous variants (Supplementary Table 5).

We next assessed the rate of de novo and inherited pdSNVs in cases and unaffected siblings and found no overall excess of such variants in cases (Supplementary Fig. 6).

Then, we tested if there was a difference in paternal and maternal origin of inherited pdSNVs in ASD individuals, but we did not identify any bias in transmission considering all pdSNVs (1300 paternal vs 1259 maternal), novel pdSNVs only (401 paternal vs 411 maternal), or only novel pdSNVs not shared with unaffected sibs (315 paternal vs 309 maternal).

Given the well-known role of synaptic genes in ASD pathogenesis, we used the SynGO platform20 (dataset version: 20210225) to investigate whether the affected individuals showed an enrichment of rare pdSNVs in genes involved in synaptic components or functions. Among the 2,156 genes harbouring pdSNVs in cases (Supplementary Table 6), 254 were SynGO annotated genes. When compared with the “brain expressed” background set (18,035 unique genes including 1,225 SynGO annotated genes), our list showed a significant enrichment at 1% FDR for 13 Cellular Component (CC) terms and 5 Biological Processes (BP) terms (Fig. 2, Supplementary Table 7a). A similar pattern was obtained when we restricted the enrichment analysis to the category of novel pdSNVs or novel pdSNVs not shared with unaffected siblings: a significant enrichment at 1% FDR was retained for about the same number of GO terms (Supplementary Fig. 7, Supplementary Table 7b-c), maintaining the same four most significant CC and BP terms (synapse, process in the synapse, post-synapse, synapse organization). In contrast, the same analysis performed on the 6,666 genes carrying rare synonymous variants in ASD individuals highlighted 517 SynGO annotated genes, without any significant enrichment for CC or BP terms (Fig. 2).

Fig. 2: Enrichment for de novo and inherited pdSNVs in SynGO Genes.figure 2

Visualisation of gene set enrichment analyses (GSEA) of genes harbouring pdSNVs (left) and synonymous variants (right) in affected individuals, each compared to a background set of brain-expressed genes. All Cell Components (CC) or Biological Process (BP) related terms with gene annotations in SynGO are plotted in a circular fashion, with the highest hierarchical term (“synapse” for CC or “process in synapse” for BP) in the centre and each layer of subclasses in outward concentric rings. Over-represented synaptic terms are indicated with different colours, according to the Q-value, and are reported in detail in Supplementary Table 7. The CC and BP plots of genes affected by rare pdSNVs (left) show an enrichment of synaptic terms, while no enrichment emerged from the genes hosting rare synonymous SNVs (right).

To assess the contribution of deleterious variants in high-confidence ASD and/or NDD genes (n = 684, Supplementary Table 8)5,6, we selected all the de novo/inherited pdSNVs located in such genes in ASD individuals and unaffected siblings. Our study identified rare pdSNVs in 97/232 high-confidence ASD genes (Fig. 3a) and in 139/452 high-confidence NDD genes (Fig. 3b). When we restricted the selection only to novel pdSNVs, we found that these pdSNVs affected 46 ASD genes (Supplementary Fig. 8a) and 64 NDD genes (Supplementary Fig. 8b).

Fig. 3: Contribution of de novo and inherited pdSNVs to high confidence ASD/NDD genes.figure 3

De novo and inherited pdSNVs include PTVs in genes with LOEUF score <0.6 (PTVLOEUF), missense variants with MPC ≥ 2 (DmisB) and missense variants with 1 ≤ MPC < 2 (DmisA). Contribution of each variant type identified in ASD individuals and unaffected siblings for a list of genes previously associated to ASD (a) and NDD (b). The list of ASD genes comprised 185 genes associated at FDR ≤ 0.055 and 135 genes with FDR < 0.16 (88 of which were common between the two lists). In our cohort, pdSNVs were identified in 97 ASD genes (a). The list of NDD genes included 452 genes from a list of 664 genes associated at FDR ≤ 0.05, after the exclusion of the genes already included among the 232 ASD genes5. In our cohort, pdSNVs were identified in 139 NDD genes (b). **, genes with FDR ≤ 0.0015; *, genes with FDR ≤ 0.055; §, genes with FDR < 0.16; dotted line indicates a putative de novo PTVLOEUF.

While the rate of inherited rare variants in the 684 ASD/NDD genes was similar between cases and unaffected siblings, we observed an increased rate of de novo variants in ASD/NDD genes in affected individuals (16/144 cases (11.1%) vs 1/55 unaffected sibs (1.8%), Fisher’s exact test p-value = 0.04). Interestingly, probands carrying de novo pdSNVs in these genes versus those who did not, showed a significant positive association with severe ID (nonverbal IQ < 35) (Fisher’s Exact test p value = 0.018, OR = 4.83) (Supplementary Table 9a). When considering only the most severe de novo/inherited pdSNVs (16 high-confidence PTVLOEUF and 37 DmisB LPαM), 49 cases (34%) had at least one variant in these genes (3 probands had 2 severe pdSNVs). Comparing the probands with and without severe pdSNVs, we observed a significant association with severe ID (Fisher’s Exact test p value = 0.022, OR = 3.8) (Supplementary Table 9a). Significant associations were retained when restricting the analysis to novel de novo pdSNVs, and to novel severe pdSNVs (Fisher’s Exact test p value = 0.014 and 0.021, respectively) (Supplementary Table 9b).

Rare copy number variant analysis

Discovery of rare CNVs was performed by integrating CNV calls from SNP-array data on the entire collection of families with those from WGS of 105 families. After filtering, we defined a high-confidence set of 192 rare (frequency < 1% in our dataset) genic CNVs in cases and SLD siblings (Supplementary Table 10). These included 93 CNVs identified by both SNP-array and WGS, 79 detected only by WGS and 20 identified only by SNP-array in families not analysed by WGS. Among variants detected only by WGS, 32 (40.5%) were deletions (median size=20.2 kb) and 47 (59.5%) duplications (median size=31.7 kb).

We prioritised four categories of potentially damaging CNVs (pdCNVs) (Table 2):

Table 2 Potentially damaging CNVs (pdCNVs) identified in our cohort

a) Large CNVs ( ≥ 3 Mb). Probands with large CNVs were not present in our cohort, because most of them had been previously screened by array-CGH in a clinical setting. However, we identified a 3.2 Mb de novo tandem duplication of chr18p11 in one SLD sibling diagnosed with language and learning delay (Supplementary Fig. 9).

b) Recurrent CNVs. This category included 4 deletions and 2 duplications consistent with known RGD.

c) De novo CNVs. In addition to the two de novo CNVs included in the previous categories, we identified a 5q21.3 tandem duplication including the entire FER gene and a 2p16.2 deletion affecting the brain-expressed gene ACYP2.

d) Rare CNVs affecting dosage-sensitive NDD genes reported in GeneTrek21. This category included 19 deletions and 7 duplications, selected among deletions or intragenic duplications potentially disrupting the CDS of genes with pHaplo≥0.5522, duplications involving the whole CDS of genes with pTriplo≥0.6822 and CNVs potentially leading to in frame fusion transcripts. Among these, the inherited deletions involving PHF3, NEGR1, HOMER1 and TIAM1 are of particular interest, as these neurodevelopmental genes have been previously implicated in ASD/NDD.

Multiple hits in families with CNVs in genomic disorders loci

Since CNVs in RGD loci are often inherited and require secondary hits to reach the liability threshold for disease, we checked whether the probands heterozygote for these CNVs also had de novo pdSNVs, PTVLOEUF in NDD genes or pdCNVs inherited from the parent not transmitting the recurrent CNV. Four families carried additional variants of interest (Supplementary Fig. 10). In Fam81, the proband had a likely causative de novo DmisB variant in NFIX, which allowed us to redefine his phenotype as Malan syndrome23, while supporting the ACMG classification of 15q13.3 duplications as VUS. In Fam117, both affected children inherited a paternal 15q11.2 deletion and a maternal exonic deletion of PHF3. Moreover, each of them had a de novo Dmis, one in YME1L1 and the other in RCCD1 (Table 1, Supplementary Fig. 10). Interestingly, their father exhibited autistic traits: according to the SCDC test13,24, he had social and communication difficulties (SCDC score =15, an outlier in the SCDC parents’ score distribution) (Supplementary Fig. 3b), while in the BAPQ14 he exhibited impairments in the pragmatic language domain.

Autosomal and X-linked recessive events

To identify variants potentially acting with a recessive inheritance, we looked for homozygous and compound heterozygous pdSNVs/pdCNVs.

Biallelic pdSNVs events were identified in six genes (Supplementary Table 11): five harboured biallelic inherited DmisA, while DYNC1H1 harboured a maternal DmisA and a de novo DmisB. However, DYNC1H1 has been reported to act through a dominant mode of inheritance, therefore the de novo DmisB is likely to be the main causative variant for the ASD phenotype in this individual25.

A compound pdCNV-pdSNV event was identified in Fam91, where proband 91.3 carries a 412 kb deletion of unknown origin (paternal DNA was unavailable) and a maternal DmisA in the remaining allele of PAX7, a gene labelled as having a biallelic mode of inheritance in Genomics England neurology and NDD panel.

Considering only variants with no homozygotes reported in gnomAD and the mode of inheritance previously associated with these genes, only biallelic events in PAX7 and DSCAM met the selection criteria (Supplementary Table 11).

To identify potentially causative X-linked events, we searched for hemizygous pdSNVs and pdCNVs present in male probands and absent in unaffected brother(s) (Supplementary Table 12). We identified 4 DmisB and 28 DmisA: 16 of these are absent in males in gnomAD (v2.1.1/v3.1.2), 12 of which map in GeneTrek NDD genes.

Polygenic risk scores

To analyse the contribution of common genetic variants to ASD risk, we calculated PRS from the individuals of European ancestry of our sample using summary statistics from a recent ASD GWAS2. To test technical reproducibility, we compared PRS in two MZ twin pairs, and no between-twin difference was detected. Even if the small difference in mean PRS between cases and unaffected sibs was not significant (Supplementary Fig. 11a), we observed a significant PRS over-transmission in cases (n = 103, pTDT mean=0.20, p value = 0.04), but not in unaffected siblings (n = 44, mean=0.11, p value = 0.39) (Supplementary Fig. 11b). Considering SPX and MPX separately, we did not observe a significant PRS over-transmission in any of the two groups, likely due to the limited sample size, especially for the MPX group (SPX: 78 probands, pTDT mean=0.14, p value = 0.21; MPX: 25 probands, pTDT mean=0.43, p value = 0.06).

留言 (0)

沒有登入
gif