Uncovering potential causal genes for undiagnosed congenital anomalies using an in-house pipeline for trio-based whole-genome sequencing

Clinical characteristics and phenotypic classification of the study participants

The clinical features of patients with CA were analyzed using clinical indices such as gestational age, birth weight, gender, preterm, in vitro fertilization (IVF), and maternal/paternal age. The average gestational age was 37.2 weeks (± 1.8) and the average birth weight and height were 2501.4 g (± 602.5) and 45.4 cm (± 3.1), respectively. The proportion of IVF was 33.3% (6/18) and the average maternal age was 35.3 years (± 3.3). The number of mothers over 35 years old was 13 (72.2%) and the proportion of infants that were small for gestational age was 61% (11/18). The majority of phenotypes (75.1%) were detected at birth or less than 1 month after birth. Half of the patients have 5–10 phenotypes (Table 1). Abnormality of the nervous system was the top phenotypic subgroup followed by several organs/phenotypes such as abdomen, growth, head or neck, skeletal, cardiovascular, and genitourinary abnormalities. All participants had multiple anomalies with four or more phenotypes (Fig. S1 and Table S1).

Table 1 Clinical characteristics of the study populationVariants discovery using the established in-house pipeline

We analyzed the variant-annotated data from the established in-house pipeline to identify potential CA-associated candidate variants (Fig. 1). To identify SNV/Indel variants, firstly, we focused on de novo variants because our participants were composed of trio samples. As shown in Fig. 2, after filtering variants for annotation of coding sequences and splicing sites, 26 DNVs were identified. Secondly, after filtering out non-private variants present in other family samples, we identified 64 CH variants within the coding region of 27 genes. Thirdly, after filtering out variants missing one parental side or those also discovered in other families, we collected 47 rare homozygous variants in the coding region of 43 genes. To identify CA-associated CNV/SV, we focused on de novo variants. From the CNV analysis using the DRAGEN platform, we identified 54 de novo CNVs (dnCNVs) following visualization using Samplot and IGV software. From the SV analysis, using Manta software embedded in DRAGEN system, 56 de novo SVs were identified (Fig. 2). From the 18 undiagnosed patients with CA, we identified 35 candidate SNV/Indel variants, including seven DNVs and 28 CH variants in 21 genes (Table 2). Some patients have more than one candidate gene because the most likely candidate genetic variants could not be identified. SNV/Indel variant types include three frameshifts and two stop-gains inducing premature termination of protein synthesis, one stop-loss, one in-frame deletion, one splicing, and 27 missense variants. Despite careful examination of all data, including CNV, SV, and SNV/Indel, we were unable to identify any candidate variants in one patient (patient-8). Furthermore, of the 29 families recruited, 11 CA patients had previously been diagnosed at the hospital with known disease-causing genes (Fig. 2). Using our in-house pipeline, we once again verified the genetic variants in these 11 pre-diagnosed CA patients and confirmed the capability of our pipeline [23]. As a result, we identified 11 diagnosed genes, including SNVs/Indels in KRT14, ABCC8, CREBBP, PURA, PKHD1, PTPN11, HNRNPR, SERPINC1, and NIBPL, as well as a CNV and an SV deletion in EYA1 and FOXC1 (Table S3).

Fig. 2figure 2

Summary of results from the trio-based WGS analysis of patients with congenital anomaly using the established in-house pipeline. The numbers in parentheses indicate the number of variants

Table 2 List of potential candidate or possible causal variants from 18 undiagnosed CA patientsIn silico prediction of pathogenicity of candidate variants

We have predicted the pathogenicity of 35 candidate genetic variants (Table S4). As expected in ACMG guideline-based variant classification, we found that stop-gain and frameshift-induced premature termination variants may have the strongest CA-development effects. These genes, including RYR3, MED24, YY1AP1, SLC27A3, and RARS1, were found to belong to this category (stop-gain and frameshift-induced premature termination variants). Additionally, variants in other genes such as COL18A1, UNC80, FREM2, ITGA8, AOX1, COL5A2, and NOTCH1, exhibited pathogenic scores according to CADD. Among the genes, the RARS1 gene in patient-21 has the second highest number of pathogenic variants, with one variant showing 11 pathogenic-points in ‘Meta-score’ and 17 pathogenic-points in ‘Individual-score’. The other variant induces frameshift-induced early termination, which is expected to be pathogenic. The third-most pathogenic variant is the DNV in the RYR3 gene found in patient-4, with 8 points in ‘Meta-score’ and 12 points in ‘Individual-score’, respectively. Additionally, several candidate genes have variants, including MED24, FREM2, CSMD1, and NOTCH1, with potential pathogenic properties.

We also assessed the stability and flexibility of proteins altered by missense variants using DynaMut2. Destabilization was indicated in at least one variant of the following genes: NRXN1, MED24, COL18A1, COL6A6, ITGA8, AOX1, SLC27A3, RARS1, COL5A2, CFTR, and NOTCH1 (Table S5). However, due to the unavailability of protein structures in PDB and AlphaFold, these results could not be obtained for genetic variants in the genes RANBP2, CPLANE1, YY1AP1, FREM2, CSMD1, and MACF1.

Further details on pathogenicity prediction reveal that the de novo variant, c.C298G, in the NRXN1 gene results in the substitution of proline with alanine at the 100th amino acid residue (Fig. S2A, B and C). This particular amino acid is situated within the laminin G-like 1 domain and is highly conserved across a variety of species, including humans, rhesus monkeys, mice, chickens, and zebrafish. Structural prediction of NRXN1 revealed that Pro100Ala variant is expected to destabilize the NRXN1 protein, resulting in a stability change of −0.58 kcal/mol (Table S5). Further details regarding other genetic variants show that the arginine at position 634 (Arg634), one of the CH variants in RARS1, is highly conserved across various species, from humans to zebrafish, in terms of evolutionary conservation (Fig. S2D, E and F).

Additionally, three-dimensional RARS1 modeling, using DynaMut2, indicated that in the wild-type (WT) form, Arg634, being a positively charged amino acid, forms hydrogen bonds (H-bonds) with five other amino acids (Cys615, Cys617, Asn631, Leu637, and Gln571). However, the change from Arg634 to Leu634 is predicted to result in the loss of H-bonds with Gln571, Cys615, and Asn631, while gaining additional H-bonds with Cys617, Leu637, and Cys638. Additionally, there are dramatic changes in the number of interactions, such as hydrophobic, polar, ionic, and van der Waals interactions, with neighboring amino acids. Therefore, the amino acid change, Arg634Leu, is expected to destabilize the RARS1 protein, with a stability change of −0.29 kcal/mol (Table S5).

Identification of possible causal variants for CA

Following the in-silico prediction of variant pathogenicity, genotype–phenotype relationships, and manual prioritization by MDT, we identified 10 possible causal variants of seven genes in seven of the patients. These variants comprise four DNVs and six CH variants, including one stop gain, one stop loss, one frameshift-induced early termination, and seven missense variants (Table 2). Detailed descriptions regarding variant pathogenicity, genotype–phenotype relationships, and the prioritization of possible causal variants for each patient are provided below.

In the case of patient-4, the variant (p.Glu4099Ter) with the stop-gain effect was predicted as likely pathogenic (LP). The variant inheritance pattern of RYR3 was autosomal recessive (AR) in OMIM; however, we observed AD (DNV). Patient-4 did not have another RYR3 variant to confirm the mode of inheritance. In patient-5, we identified one VUS DNV (p.Pro100Ala) in NRXN1. Patient-6 had CH variants in the RANBP2 gene; one was a maternal-inherited p.Ter3225Glnext*6 (classified as LP) and the other was a paternal-inherited p.Pro2640Thr (classified as VUS). In OMIM, RANBP2 gene is known for encephalopathy. In patient-13, a de novo missense variant (p.Tyr138His), classified as a VUS, was observed in FREM2, which is related to the ‘Cryptophthalmos’ and ‘Fraser syndrome 2’ in an AR inheritance mode. A DNV (p.Gly214Arg) of CSMD1 in patient-16 was predicted as VUS with PM2, PP3 and BP1, but upscaled to LP on applying PS2 de novo variant criteria. Given that the CSMD1 variants are not documented in OMIM and ClinVar, the variants are considered novel. In patient-21, the RARS1 CH variants caused the ‘Leukodystrophy, hypomyelinating, 9’ phenotype in an AR inheritance mode. The frameshift variant p.Leu592PhefsTer8 was predicted to be pathogenic, while the missense variant p.Arg634Leu was predicted to be a variant of uncertain significance (VUS). Patient-30 had CH variants in NOTCH1, known as the causal gene for ‘Adams-Oliver syndrome 3’ and ‘Aortic valve disease 1’ in OMIM. Both p.Gly2186Val and p.Asn685Ile are missense variants, classified as benign and VUS, respectively.

Functional validation of possible causal variants and identification of potential causal genes of CA patients

We validated the pathogenicity of genes containing seven possible causal genes using the zebrafish knockout (KO) model generated with CRISPR-Cas9 system and identified six genes as potential causal genes of CAs. As shown in Fig. 3 and Table S1, the KO zebrafish models for possible causal genes displayed phenotypic abnormalities corresponding to patient clinical characteristics.

Fig. 3figure 3

Phenotypic characteristics of zebrafish knockout (KO) models for six genes, including RYR3, NRXN1, FREM2, CSMD1, RARS1, and NOTCH1. A, B Atria and ventricle areas in RYR3 KO zebrafish are larger compared to those in control at 3dpf. The red arrow indicates the area of ventricle and the blue indicates the atria. C, D Compared to that in the control, the body length decreased in the NRXN1 KO model, and some individuals showed pericardial edema/congestion in KO (red arrow). E Compared to that in the control, ‘Total moved distance’ was decreased in the FREM2 KO model (left panel) and ‘the number of stimuli until escape’ increased (right panel). F, G Compared to those in the control, the atria and ventricle sizes in heart were increased (blue and red arrow), the body length was shortened, and ear area was decreased (orange arrow) in the CSMD1 KO model. H, I Body length in the RARS1 KO model is shorter and the swim bladder is not developed compared to that of the control at 5dpf (blue arrow). Some individuals showed mandible protrusion and a short mandible length compared to the control (red arrow). J, K Compared to that in the control, the body length was shortened, the size of head and eye (green circle) decreased in NOTCH1 KO. The KO model showed tail curvature (purple circle) and pericardial edema/congestion (red arrow). The data depict the area of atria and ventricle (A, F), the length of body (B, F, H, J), and the area of ear (F) for each zebrafish individual, with the data plot representing the mean ± SEM (n = 10). Statistical significance was determined using a two-tailed t-test. *p < 0.05; **p < 0.01; ***p < 0.001; dpf, day post-fertilization; SEM, standard error of the mean

The atria and ventricle areas of the RYR3 KO model were significantly enlarged compared to the control, which aligns with patient phenotypes, including ‘ventricular septal defect’, ‘secundum atrial septal defect’, and ‘tricuspid valve prolapse’ (Fig. 3A and B). The NRXN1 KO model showed a short body length and pericardial edema/congestion, related to patient phenotypes such as pulmonary artery hypoplasia (Fig. 3C and D). The FREM2 KO model showed the increase of ‘stimulus number to escape response’, related to patient phenotypes, including brain disorders such as seizures and global development delay (Fig. 3E). The CSMD1 phenotypes of the KO model were short body length, small ear, and enlarged atria and ventricular areas that were related to patient phenotypes, such as small for gestational age and secundum atrial septal defect (Fig. 3F and G). The RARS1 KO model exhibited a reduced body length and various deformities, including an uninflated swim bladder and mandible protrusion. These findings align with patient phenotypes such as ‘small for gestational age’, ‘short stature’, ‘high, narrow palate’, ‘failure to thrive’, and ‘oropharyngeal dysphagia’ (Fig. 3H and I). The NOTCH1 KO model showed pericardial edema/congestion, short body length, small head and eye, and curved tail, which are similar to patient phenotypes such as ventricular septal defect, patent ductus arteriosus, and short stature (Fig. 3J and H).

留言 (0)

沒有登入
gif