Deciphering potential causative factors for undiagnosed Waardenburg syndrome through multi-data integration

New candidate pathogenic gene identification

To predict new pathogenic genes associated with WS, we conducted a network-based integrative analysis based on the principle that interacting proteins have function association between them [28]. First, based on the PPI data, we respectively obtained 266, 167 and 152 genes that directly interact with causative genes through mapping 7 known disease causing genes (PAX3, MITF, SOX10, SNAI2, EDNRB, EDN3, KITLG) into BioGRID, InWeb_InBioMap and HitPredict databases, respectively (Fig. 2A). To improve the prediction accuracy, we extracted 43 proteins that overlapped in the three databases as candidate gene set A and constructed a PPI network (Fig. 2B). Next, we downloaded a total of 4,707 genes and their corresponding phenotypes from the HPO database. We obtained a unified phenotypic set Pws containing 70 phenotypes associated with WS by mapping seven pathogenic genes into HPO. Through comparing the phenotypic set Pg of each gene in HPO with Pws and filtering out those genes with weight Wg < 0.2, a gene set B of 47 genes was obtained (Fig. 2C). Subsequently, we extracted those overlapping genes (SIN3A, EP300, CHD7, and KIT) between gene set A and gene set B as potential WS pathogenic genes. Sankey diagram shows that the 4 candidate pathogenic genes have a total of 35 WS-related phenotypes. SIN3A, EP300, CHD7, and KIT genes share 17, 16,15, and 15 phenotypes with causative genes, respectively (Fig. 3A).

Fig. 2figure 2

Prediction of new candidate pathogenic genes for WS by PPI and phenotype-similarity approach. (A)Venn diagram of candidate genes predicted through three PPI databases. The numbers in purple, green and orange circles represent the numbers of interacting proteins predicted by the BioGRID, InWeb_InBioMap and HitPredict databases respectively. (B) PPI network of WS causative genes and candidate genes. 141 interactions between 7 WS causative genes and 43 interacted genes. Nodes in yellow denote the pathogenic genes. Nodes in blue and red denote the interacted genes. Nodes in red denote the new candidate genes. (C) Venn diagram of PPI network and phenotype-similarity network. The genes in the blue circle denote candidate genes predicted by three PPI databases. The genes in the orange circle represent candidate genes predicted by the phenotype-similarity method (the numbers of phenotypes shared with pathogenic genes ≥ 15). 4 genes (SIN3A, EP300, CHD7, and KIT) both exist in PPI network and phenotype-similarity networks, defined as the preliminary candidate pathogenic genes

Fig. 3figure 3

The WS-related phenotypes shared by causative genes and candidate genes and expression abundance of candidate genes in ear tissues. (A) A Sankey diagram displays the WS-related phenotypes shared by causative genes and candidate genes. SIN3A, EP300, CHD7, and KIT share 17, 16, 15, and 15 phenotypes with causative genes, respectively. (B) Tissue × stage expression matrix of KIT and CHD7 genes in ear tissues. Data is from the MGI database [26]. (C) Violin plot showing gene expression levels of KIT and CHD7 in DCs, IHCs, and OHCs of c3HeB/FeJ mice. Data is from the MORL scRNA-Seq database [25]

Among the abnormal clinical manifestations of WS, hearing loss is the primary symptom affecting WS patients’ health and quality of life [36]. To further confirm the four potential WS pathogenic genes, we retrieved the phenotypes of these four candidate genes in mouse from the MGI database and found that both KIT and CHD7 mutations could cause hearing loss in mouse (Additional file 2, Additional file 3). Next, we retrieved the expression abundance of KIT and CHD7 in different ear tissues based on the MGI database. The results show that both KIT and CHD7 are mainly highly expressed in the organ of Corti in the cochlea (Fig. 3B). The organ of Corti acts as an auditory receptor, and its damage leads to varying degrees of sensorineural hearing loss. The sensory epithelium of the organ of Corti is made up of HCs and SCs. HCs are sensory cells, and they are key components in sound perception. We then further retrieved the expression level of KIT and CHD7 in cochlea single cells based on the MORL scRNA-Seq database. The results show that KIT is mainly highly expressed in OHCs. CHD7 is highly expressed in IHCs, OHCs, and DCs (Fig. 3C). Taken together, the above results indicate that KIT and CHD7 could be high-priority candidate pathogenic genes contributing to WS.

The identification of disease-causing variations in WS-related genes

We obtained 88,869 possible variations of PAX3, 77,446 possible variations of MITF, and 1,544 possible variations of SOX10 by ANNOVAR annotation. After filtering out variations based on the cutoff value (SIFT_score ≤ 0.001, Polyphen2_HVAR_score > 0.957, MutationTaster = 1, CADD_phred > 25, GERP++_RS > 4), 28 PAX3 variations, 20 MITF variations, and 9 SOX10 variations were obtained and kept as candidate disease-causing mutations (Additional file 4). Strikingly, 16 of the 57 variations have been reported to be pathogenic to WS in ClinVar and DVD (Additional file 4). The remaining 20 candidate variations in PAX3, 16 candidate variations in MITF, and 5 candidate variations in SOX10 may be considered as potential disease-causing variants.

Next, the multiple sequence alignment of PAX3/MITF/SOX10 homologous genes among 7 species (Homo sapiens, Macaca mulatta, Pan troglodytes, Mus musculus, Equus caballus, Sus scrofa, Canis lupus familiaris) with COMBALT showed that 20 variations in PAX3, 10 variations in MITF and 5 variations in SOX10 located in highly conserved regions among those species (Additional file 4). Subsequently, we checked the position of those candidate disease-causing variations in protein structure and found 20 variations (p.Gln470Leu, p.Tyr458His, p.Gln431Glu, p.Val114Met, p.Tyr366Asn, p.Pro244Leu, p.Gly42Cys, p.Gly34Cys, p.Gly48Ala, p.Gly34Arg, p.Arg156Cys, p.Ser152Asn, p.Pro382His, p.Asn47Lys, p.Gly417Arg, p.Trp131Ser, p.Pro333Leu, p.Asp144Tyr, p.Arg220Cys, p.Arg156His) in PAX3, 7 variations (p.Arg263Gly, p.Arg279Leu, p.Pro232Arg, p.Arg279Trp, p.Arg263Cys, p.Arg263Leu, p.Glu287Lys) in MITF and 5 variations (p.Thr437Met, p.Arg465Leu, p.Thr461Met, p.Pro245Arg, p.Pro302Leu) in SOX10 located in domain regions (Additional file 4, Fig. 4), which may influence the protein structure, interaction and function. Thus, we proposed that these 32 candidate mutations may be closely associated with WS disease.

Fig. 4figure 4

Position of candidate disease-causing variants in protein domains. (A) Structure of human PAX3 protein with functional domains. PD, paired domain; O, octapeptide motif; HD, homeodomain; TA, transactivation domain. (B) Structure of human MITF protein with functional domains. TAD, transactivation domain; b-HLH-Zip, basic helix-loop-helix leucine zipper. (C) Structure of human SOX10 protein with functional domains. Dim, dimerization domain; HMG, high-mobility group; K2, K2 domain; TA, transactivation domain

The distribution of genes and variation types

Information on 443 WS cases was collected from 84 published literature. The most common genetic causes were MITF variants (36.34%), PAX3 variants (31.15%), and SOX10 variants (27.99%), which accounted for 95.48% of molecular diagnosed WS patients (Fig. 5A). PAX3 variants were the most frequent genetic causes of WS1 and WS3 patients, which accounted for 86.40% and 100.00% respectively. The most common genetic factors of WS2 and WS4 were MITF variants (60.00%) and SOX10 variants (70.97%) respectively (Fig. 5A).

Fig. 5figure 5

The distribution of genes and variant types. (A) The gene distribution of four WS subtypes. (B) The distribution of mutation types in WS (the left figure); the distribution of mutation types of 7 pathogenic genes in WS (the right figure). (C) A heatmap was used to show the correlation between any two phenotypes in WS. The circle shown in the figure suggests that the correlation it represents has passed the statistical test. The value on the circle is used to evaluate the degree of relationship. Red circles indicate that the two phenotypes are positively correlated. Blue circles indicate that the two phenotypes are negatively correlated

Nonsense (27.87%), frameshift (26.46%), missense (23.19%), and splicing (10.77%) mutations occupied 88.29% of total causative variants. Relatively frequent variants were also observed, including gross deletion (6.09%) and in-frame deletion (5.15%) (Fig. 5B) (the left figure). Nonsense mutations were the majority of variants in MITF gene. Missense and frameshift mutations were more common in PAX3, SOX10, EDNRB, EDN3, and KITLG (Fig. 5B) (the right figure).

The position distribution of known disease-causing variants related to WS

The gene structure diagram displayed the common disease-causing variants associated with WS in 7 pathogenic genes (Additional file 5). Results showed that the majority of variants (95.98%) occurred in exon regions in seven pathogenic genes. Thus, screening of whole exons can be prioritized in genetic testing. Currently, whole-exome sequencing has been widely used in the diagnosis of genetic diseases [37, 38]. However, since the seven genes also have a few variants that occur in intron regions, the intron-exon boundaries should also be inspected during genetic testing. Moreover, as displayed in Fig. 5B (the right figure), gross deletions account for 1.90%, 6.98%, and 10.00% of MITF, PAX3, and SOX10 variants, respectively. Thus, when WS cases with no variants were detected in exons and introns, structure variations should be considered. In addition, approximately 39% of WS, 14.8% of WS1, 26.3% of WS2, and 15–35% of WS4 patients remain unknown for the pathogenic genes [5,6,7]. Considering the rapid development of sequencing technology with significantly decreased in cost and time [39, 40], whole genome sequencing can be a prioritized approach when the conventional clinical genetic testing methods fail to detect the disease-causing variants for WS.

Genotype-phenotype correlation analysis

PAX3, MITF, and SOX10 are the main pathogenic genes of WS patients, which accounted for 95.48% of our collected 443 cases from published literature (Fig. 5A). To avoid analysis bias resulted from the insufficient reported cases, the cases for the other known WS-causing genes (EDNRB, EDN3, SNAI2, and KITLG) were excluded for subsequent analysis. Additional file 6 summarizes the frequencies of the 13 most common phenotypes among our collected patients with variants in PAX3 (n = 138), MITF (n = 161), and SOX10 (n = 124). We conducted association analysis to investigate whether two phenotypes were prone to coexist among these common phenotypes. Results revealed that the occurrence of telecanthus and synophrys was linked to the broad nasal root (corr = 0.4/0.56) (Fig. 5C). However, some phenotypic pairs passed the statistical test with relative low correlation coefficient, more clinical cases are needed to confirm the association.

De novo variants were more common in patients with SOX10 variants (61.70%) than patients with MITF or PAX3 variants (p = 5.2013E-11 for group comparison, significant pairwise comparisons: p [PAX3 vs. SOX10] = 7.864E-11, p [MITF vs. SOX10] = 0.000001). The ratio of gender showed no significant difference between the different genes. Among those 13 phenotypes, hearing impairment and iris pigmentary abnormality were the most frequent phenotypes in WS patients, which accounted for 84.56% (345/408) and 74.88% (300/402) respectively. Statistical analysis showed that hearing impairment was more frequent in WS probands with SOX10 variants (p = 1.6847E-07 for group comparison, significant pairwise comparisons: p [PAX3 vs. SOX10] = 1.0862E-08, p [MITF vs. SOX10] = 0.000007), which is similar to previous reports [7]. In addition, patients with SOX10 variants were more likely to occur bilateral profound hearing impairment, vestibular deformity, cochlear hypoplasia, and hypoplasia of the semicircular canal (Fig. 6A). We retrieved the expression abundance of SOX10 in different ear tissues based on the MGI database and found that compared with PAX3 and MITF, SOX10 was widely and highly expressed in the organ of Corti, semicircular canal, inner ear vestibular component and otocyst (Fig. 6B). Moreover, we found that SOX10 also had a higher expression level than PAX3 and MITF in cochlear single cells (IHCS, OHCs, DCs) (Fig. 6C). Interestingly, we found that nervous system diseases such as intellectual disability, sensorimotor neuropathy and peripheral demyelinating neuropathy only occurred frequently in patients with SOX10 variants but were absent in those with MITF or PAX3 variants (Fig. 6A, Additional file 7). RNA-seq data revealed that SOX10 has a higher expression level than PAX3 and MITF in brain tissues (Fig. 6D). The above results indicated that phenotypic profiles of the SOX10 gene are closely related to tissue expression patterns.

Fig. 6figure 6

The common phenotypes of WS and the expression abundance of genes in ear tissues and brain tissues. (A) The heatmap showed the percentage of 24 common phenotypes in WS. (B) Tissue × stage expression matrix of SOX10, MITF, and PAX3 genes in ear tissues. MGI database only annotates the expression of MITF and PAX3 in the organ of Corti. Compared with PAX3 and MITF, SOX10 was widely and highly expressed in the organ of Corti, semicircular canal, inner ear vestibular component and otocyst. Data is from the MGI database [26]. (C) Violin plot showing gene expression levels of SOX10, MITF, and PAX3 in DCs, IHCs, and OHCs of c3HeB/FeJ mice. The expression level of SOX10 is higher than MITF and PAX3 in cochlear single cells. Data is from the MORL scRNA-Seq database [25]. (D) Line chart showing the expression level of SOX10, PAX3, and MITF in different brain tissues. SOX10 has a higher expression level than MITF and PAX3 in different brain tissues. The yellow, orange and blue lines denote pathogenic genes SOX10, MITF, and PAX3, respectively. Data is from the Human Protein Atlas database [27]

Pigmentation abnormalities in WS patients mainly included iris pigmentary abnormality (heterochromia iridis, blue irides, and iris hypopigmentation), hypopigmented skin patches, skin freckles, white forelock, and premature graying of hair. Iris pigmentary abnormality occurred frequently in patients with PAX3 and SOX10 variants (p = 0.004280 for group comparison, significant pairwise comparisons: p [PAX3 vs. MITF] = 0.020436, p [SOX10 vs. MITF] = 0.002416). Skin freckles and premature graying of hair were frequently observed in patients with MITF variants (70.68% and 42.73% respectively). White forelock occurred significantly more often in patients with PAX3 variants (p = 0.004427 for group comparison, significant pairwise comparisons: p [PAX3 vs. MITF] = 0.016571, p [PAX3 vs. SOX10] = 0.002665). No difference was found in the prevalence of hypopigmented skin patches among different genes. Synophrys and broad nasal root occurred mainly in cases with PAX3 and MITF variants (p = 0.004425 and p = 0.000635 for group comparison respectively). Telecanthus was frequently present in WS patients with PAX3 variants (p = 2.1124E-32 for group comparison, significant pairwise comparisons: p [PAX3 vs. MITF] = 1.0045E-22, p [PAX3 vs. SOX10] = 2.5541E-24). It is important to note that aganglionic megacolon occurred frequently in patients with SOX10 variants (36.00%, 18/50) but was absent in those with MITF or PAX3 variants. Underdeveloped nasal alae was only observed in patients with PAX3 variants but no difference was found among different genes, mainly because this phenotype was poorly documented in cases with MITF and SOX10 variants. For other rare phenotypes (Additional file 7), the genotype-phenotype associations need to be further confirmed in more WS cases.

Subsequently, we conducted a statistical analysis to investigate whether there is a gender difference in the prevalence of phenotypes. Results revealed that no difference was found in the prevalence of those common phenotypes (hearing impairment, skin freckles, hypopigmented skin patches, premature graying of hair, iris pigmentary abnormality, synophrys) between different genders (Additional file 8). The other phenotypes were not included in the gender analysis due to the small sample size.

留言 (0)

沒有登入
gif