Novel clinical, molecular and bioinformatics insights into the genetic background of autism

Results of the performance of the two groups according to our clinical measures on the experimental tasks are presented in Table 3 as well as between group comparisons. For the different tasks (all except picture comprehension), non-parametric tests (Mann–Whitney test) were conducted to compare performances of ASD_MH and ASD_L groups.

Phenotypic resultsCognitive measures

As expected by the inclusion criteria of the groups, in non-verbal IQ ASD_L group had worse performance (Mann–Whitney U = 0, p < 0.001). The same holds for attention total score, auditory attention, visual attention and visual range attention (Mann–Whitney U = 13, 24, 7 and 20.50, respectively, p < 0.001). In VSTM sentence and word recall the difference between the two groups was also significant (Mann–Whitney U = 11.50 and 4 respectively, p < 0.001), as well as in immediate and delayed visual memory, visual information recall, information retention factor and recognition (Mann–Whitney U = 13.50, 24, 32, 32 and 10 respectively, p < 0.001).

Language measures

In expressive vocabulary, the difference between the two clinical groups was significant as expected (Mann–Whitney U = 0, p < 0.001). In narration, the difference in groups' performance was also significant in both total elements and total sections (Mann–Whitney U = 25.50 and 32, respectively, p < 0.001).

In sum, in all measures, both cognitive and language, the ASD_MH group outperformed the ASD_L group.

Machine learning data analysis

Machine learning was performed by utilizing the severe autism (n = 15) and non-severe autism (n = 18) samples to train the linear regression model classifier described in the Material and Methods. As detailed in Fig. 1, a leave-one-out cross-validation (LOOCV) procedure was used for to assess the classifier performance. Feature, or variant selection was coupled to the LOOCV procedure to ensure an optimum set of best classifier variants is obtained. The optimum set was determined to be the top 26 variants for every LOOCV iteration and therefore these variants were selected for downstream functional analysis. To assess the performance of each feature selection run, the accuracy, specificity, sensitivity and Matthew’s correlation coefficient (MCC) were calculated. Results are summarized in Fig. 3. Comparison of the optimum results attained by the molecular subtype classification defined by our risk model, with prior clinical grading, showed that they were in agreement with 81.81% (27/33 samples) prediction accuracy. Sensitivity, specificity and MCC achieved values of 73.33%, 88.89% and 0.634, respectively. Plotting receiver operating characteristic (ROC) curves resulted in area under the curve (AUC) with value 0.83. Visualization of risk model outputs for all samples using clustering algorithms (including annotation with clinical metadata) is shown in Fig. 4. The top 26 variants obtained from every LOOCV iteration were pooled together to obtain a total of 84 unique variants. Table 4 shows a list of these variants as well as their genes and full annotation including the frequency of occurrence according to the 1000 genome project (aaf_1kg_all). Full annotation for these variants including levels of heterozygosity and homozygosity and annotation in clinical database such as ClinVar. Notable molecular significant variants from the list are known to be implicated in the genetic predisposition of certain diseases and disorders including: certain cardiomayopathies (rs12063382), hypertension (rs1061157), afibrinogenemia (rs2070018), ciliary dyskinesia (rs3752042), congenital cataract (rs4682801), prostate cancer (rs1328285, rs9890913), infantile epilepsy and Parkinson’s (rs56260729), mental disability and schinzel-giedion syndrome (rs12922670, rs11082414), cerebellar hypoplasia (rs77247739), kabuki syndrome (rs5952285, rs5952682), autism (rs7049300) and even response to drug administration such as ezetimibe (rs10264715).

Fig. 3figure 3

Results of the Classification process described above showing statistics for each LOOCV run across different values of top significant variants selected for validation. Statistics are recorded in the form LOOCV prediction accuracy (blue bars) of sensitivity (orange bars) specificity (gray bars) and finally Matthews correlation is shown (yellow line)

Fig. 4figure 4

Visualization of risk model results for 33 ASD patients (18 non-severe and 15 severe) using the 26 variants selected during LOOCV. The dendrogram was obtained by performing hierarchical clustering (using Euclidean distance and average linkage algorithm) of model prediction outputs. The clustering represents the molecular subtypes obtained by the trained model for all ASD patients. The two molecular subtypes as predicated by the risk models are color-coded as pink for the most severe cases (high-risk individuals), light green for least severe cases (low-risk individuals). Moreover, the continuous spectrum of risk prediction scores is shown in the red-green gradient traversing the dendrogram. Patients are further sorted by severity in descending order. Clinical experimental data is also viewed in parallel to the results obtained from the machine learning algorithm and are shown as columns with dark and light gray boxes. The boxes denote the different level of severity for the six different clinical data available for this study. The molecular classification of samples 8574_9, 8574_14, 8574_7 and 8574_23 appears to differ from the clinical classification. These samples cluster separately from the rest of the samples with similar severe clinical phenotypes. Similarly, based on theory molecular classification, samples 8574_13 and 8574_18 also appear to cluster away from samples of similar non-severe clinical classification

In addition to the variant biological insights, the same thought process can be applied on the genes themselves. In total this method highlighted 60 genes, 12 related to mild and 48 to severe autism. Out of those 3 related to mild autism and 12 related to severe autism are already known and can be found in the validation dataset created from 5 databases which is described in our methodology. These results can be found in Table 1.

Table 1 Validation of the IGs highlighted by our machine learning approach with the help of the 5 autism-related databases (AutismKB, SFARI, HuVarBase, DisGeNET and OpenTargets)Literature-based approach—IGs

Using the approach previously described in our methodology, 1005 unique genes were evaluated as being homozygous to the reference alternate allele and marked in the SIFT and Polyphen databases as IGs in all our samples. Before focusing on sub-phenotypes we just pooled the IGs from children with mild and severe autism, respectively, together and validated this IGs dataset versus the 5 databases (See Methodology). In total 96 IGs of mild autism and 98 IGs of severe autism were found in the validation dataset. Out of those 70 were common between them.

Investigation of the sub-phenotypical sample clusters of IQ, Memory, Attention, and Verbal, based on the clinical observations, led to identifying 14 IGs which were present in all sub-phenotypes regardless of severity (Fig. 5 in yellow background).

Fig. 5figure 5

Common and distinct selected Genes harboring variants (candidate mutations) with possible contribution to ASD risk, based on severity across distinct ASD subgroups (phenotypes)

In samples from children with mild autism, the “IQ mild” sub-phenotype had 18 IGs common in all samples of the group, the “Memory mild” 18 IGs, the “Verbal mild” 17 IGs and the “Attention mild” 38 IGs. Also 3 IGs (CDCP2, CELSR1, OR2L8) were common between all mild sub-phenotypes. The “IQ mild”, “Memory mild” and “Verbal mild” sub-phenotypes were almost identical, with the exception of OR11L1 for “IQ mild” and ZNF534 for “Memory mild” being highlighted as IGs. OR11L1 was missing as IG for “Memory mild” and “Verbal mild” but was highlighted in “Attention mild”. There is also a group of 7 genes (C11orf35, DCLRE1A, IGSF10, LRRC56, MS4A14, OR56B1, OR5AU1) which were highlighted as IGs only in “Attention mild”.

Samples from children with severe autism when studied per sub-phenotype highlighted 22 IGs for the “IQ severe group”, 24 IGs for “Memory severe”, 24 IGs for “Verbal severe” and 14 IGs for “Attention severe”. The IG PTPRQ was found in all our severe autism samples. The “Memory severe’ and “Verbal severe” sub-phenotype IGs were identical. In addition, the IGs OR6C65 and SALL3 were only found in these 2 sub-phenotypes. IGs KRTAP10-11, ZNF534, DNAH14, FBXW8, NIPA1, TPD52L3 were common in all severe sub-phenotypes with the exception of “Attention severe”, whereas C3orf18 was found as IG only in “IQ severe”, “Memory severe” and “Verbal severe”. Finally, the “Attention severe” group was the only one lacking the HPS4 severity-independent IG.

Another round of validation versus the 5 database dataset was performed for these sub-phenotype IGs (Table 2). In total 10 IGs exist in both our data and the validation dataset. Their breakdown per sub-phenotype is: NEMF was highlighted for all sub-phenotypes regardless of severity. NIPA1 was highlighted in all severe sub-phenotypes except for “Attention severe” in which it was found in 87% of samples, and all samples of “Attention mild”. CELSR1 was validated for all mild sub-phenotypes. MS4A14 was validated only in “Attention mild”. Finally, “Attention mild” was the only sub-phenotype with GALNTL5.

Table 2 Validation of the IGs highlighted by our literature-based approach with the help of the 5 autism-related databases (AutismKB, SFARI, HuVarBase, DisGeNET and OpenTargets). Results highlighted in Bold were found in SFARIFunctional analysis

As discussed in our methodology section, the genes highlighted by our two approaches were investigated regarding their functional role and their participation in biological processes. The results were grouped according to their function into 15 distinct categories: Developmental Biology, Nervous System Development, Synapses—Neurotransmission, Morphogenesis And Structure, Trafficking And Transport, Sensory, Cell Signaling, Cell Migration/Motility, Differentiation, Cell Cycle, Programmed Cell Death, Epigenetics, Metabolism, Post-Translational Modifications and Immunosystem.

For the genes highlighted by our machine learning approach, Fig. 6 showcases the Autism Mechanisms (AMs) implicated in severe and mild autism respectively. In total for the category Developmental Biology 7 genes in severe and only 1 in mild autism are involved. For the Nervous System Development 7 genes are involved in severe autism and 3 in mild. For Synapses—Neurotransmission 4 genes involved in severe and 1 in mild autism. For Morphogenesis and Structure 10 genes are involved in severe and 6 in mild. There are 3 genes for severe autism and 2 for mild involved in Trafficking And Transport, 5 genes for severe and 3 for mild in Sensory, 20 for severe and 5 for mild autism in Cell Signaling, Cell Migration/Motility, Differentiation, Cell Cycle, Programmed Cell Death, Epigenetics, Metabolism, Post-Translational Modifications and Immunosystem.

Fig. 6figure 6

Functional analysis of the genes discovered by our Machine Learning approach for Severe and Mild autism. Figure shows individual gene participation in specific Autism Mechanisms (AMs)

Breaking down the AMs brought to the foreground using our literature-based method (Fig. 7), there are 69 severity-independent AMs which span over all our categories except Epigenetics and Metabolism which appear to be severity-associated. In the severe autism group the AMs associated with IGs of “Language severe” and “IQ severe” are identical with the exception of the “Gene silencing” AM found only in the “IQ severe” due to the IG C3orf18 and in total have 36 common AMs. Children in these groups also don’t appear to have AMs related to neurotransmission and cell cycle events. Likewise, the “Verbal severe” and “Memory severe” sample groupings are identical regarding their 51AMs (which is to be expected since they share the same IGs). There are no AMs associated with neurotransmission and cell cycle processes in these 2 groups, The “Attention severe” AMs are all related to the PTPRQ IG which is the only severity-dependent IG in the group. PTPRQ is linked with developmental, morphogenic, sensory and signaling processes.

Fig. 7figure 7

Functional analysis of the genes discovered by our literature-based approach for Severe and Mild autism broken down by specific clinical sub-phenotypes. Figure shows individual gene participation in specific Autism Mechanisms (AMs)

In our mild autism sub-phenotypical groupings only the “Memory mild” and “Verbal mild” are completely alike. These include 23 AMs in total which are associated with general and nervous system development, neurotransmission and morphogenesis. The “IQ mild” IGs are involved in 25 AMs which do not include any associated with trafficking and transport, cell cycle, epigenetic, metabolic or immunological processes. The “Attention mild” IGs are involved in 41 AMs from all our categories but do not include any epigenetic modifications. Finally, the “Language mild” AMs are the most complex category spanning across 77 AMs from all our categories including the epigenetic histone phosphorylation.

In general, for both approaches many variants found in individual genes, like AGRN (which is involved in 9 functional modules in severe ASD), ARHGEF11, NR0B1, NGEF, FOXN3, ITGF2 and MCF2, appear to be connected to a multitude of biological processes. Therefore, perturbations in any of these crucial genes, which have multiple functional involvements, may trigger the advent of disorders related to the structures and function of the CNS. It is also revealed that some genes like HEATR5A, ITGF2, KRTAP10-1 and MCF2 are involved in a single process like the morphogenesis and structure of synapses. Furthermore, we found mutational events in various proteins involved in sensory pathways which could explain the broad range of sensory abnormalities regularly observed in individuals across the autistic spectrum. Several of our findings highlighted perturbations in sensory and perceptual pathways which may explain impairments of attention, IQ, verbal ability and memory. For example, in all our severe autism samples (and none from the mild autism grouping) a common affected gene (PTPRQ) is found in all four clinical sub-phenotypes which is linked in literature to auditory impairment [55]. This gene can potentially serve as a biomarker of autism severity.

Deleterious/damaging variants in genes which encode signaling proteins can significantly alter the course of brain development, synaptic structure/ function and morphogenesis. For instance, the NLGN protein, found as significant in our results, plays an important role in synapsis and has been implicated by previous works in ASD [56]. In general, gene-encoding protein signaling is fundamental in neurodevelopment and post neurodevelopment processes such as synapse organization (AGRN, TJP1), cell migration (ACTN2, TJP1), axon guidance (ACTN2, AGRN, TNK2, ARHGEF11, NGEF, FGA) and dendrite development (AGRN), and any perturbation in processes like these may trigger the rise of disorders related to the structures and functions of the CNS. Also, BMPs, whose signaling has been shown to be dysregulated in ASD, constitute the largest subdivision of the TGF-β superfamily and are critical in the development of the CNS.

留言 (0)

沒有登入
gif