Noninvasive fetal genotyping of single nucleotide variants and linkage analysis for prenatal diagnosis of monogenic disorders

Rationale of SNP genotyping and detection of allelic imbalance

Without knowing the paternal genotype, if the mother is homozygous (A/A) for a SNP locus, the fetus could be A/A or A/B, in which “A” and “B” represent the two possible alleles wild or mutant. If mother is heterozygous (A/B), then the fetus could be A/A, B/B or A/B. When fetal and maternal genotypes differ, the paternal allele will be over-represented in plasma DNA (contains fetal DNA) relative to maternal DNA (does not contain fetal DNA). The allelic imbalance can be detected via amplicon sequencing and comparison of the allele count. Alternatively, if fetal and maternal genotypes are identical, no such imbalance will be present (Fig. 1A).

Fig. 1figure 1

The rationale of NIFG in fetal genotyping. A The illustration showing how the allelic imbalance between maternal genomic DNA and plasma DNA can be used to infer the genotype of the fetus. B Simulated SPRT and Chi-squared test results when mother is heterozygous and fetus is homozygous for a given SNP locus

The degree of the allelic increase depends on fetal fraction and the detection sensitivity is determined by sequencing depth and error rate. We evaluated two statistical methods for their performance in detecting allelic imbalance: sequential probability ratio test (SPRT) and Pearson's Chi-squared test (χ2).

SPRT allows two hypotheses to be compared as data accumulate, which has been successfully applied to detect allelic imbalance in the diagnosis of monogenic disorders by NIPD [7, 24]. On the other hand, Chi-squared test examines whether an observed allele frequency distribution deviates from an expected value. In both scenarios, the null hypothesis is that fetal and maternal genotypes are identical. When mother is homozygous (AA or BB) with the fetus being heterozygous (AB), the presence of the paternal allele in plasma DNA would be recognized at low sequencing depth since it is absent from the maternal genome. In contrast, when mother is heterozygous (AB) and the fetus is homozygous (AA or BB), both alleles are present in the maternal sample, making it more challenging to detect the over-represented maternal allele.

In Fig. 1B, we simulated the results of two statistical tests performed at maternal heterozygous, fetal homozygous SNPs. We assumed an equal distribution of read counts between the two alleles (no amplification bias). We used a likelihood threshold of 1000 for SPRT and set the alpha level at 0.001 for the Chi-squared test. As expected, the sequencing depth required to reach statistical significance was inversely correlated with fetal fraction. For instance, in SPRT, 8650X and 1000X yielded correct fetal genotypes at 4% and 12% fetal fraction, respectively. For Chi-squared test, the same outcome was achieved at 13400X and 1450X, respectively.

The accuracy of SNP genotyping

In a proof-of-concept study, we selected 36 SNPs (minor allele frequency, MAF > 0.3, 1000 genomes) from all chromosomes in the human genome except for 13, 17 and Y. These loci included single-nucleotide changes as well as indels of 2–5 nucleotides (Additional file 2: Table S2). We prepared artificial samples in which genomic DNA from a female volunteer and her biological child were fragmented by ultrasonication and mixed at pre-determined proportions to represent plasma DNA with a fetal fraction of 4%, 6% and 8%. In addition, we collected genomic and plasma DNA from five pregnancies. Fetal fraction was estimated by calculating the percentage of paternal alleles at maternal homozygous and fetal heterozygous SNPs [28]. The amniotic fluid samples (~ 10 mL) were also obtained for verification of the NIFG results.

Thirty-six SNPs were amplified from 5 ng of genomic and plasma DNA in a multiplex reaction. A subsequent index-PCR was performed to add sequencing adaptors on both ends of amplicons, which were subject to massively parallel sequencing on the Miseq platform (150 bp paired-end, average depth: 15500X). The experiments were repeated twice, and only concordant genotyping results were considered; otherwise, a “no-call” was assigned. Overall, SPRT yielded 99.62% (95% CI 97.67–99.99%) accuracy in base calling (8.33% no-call), compared with 98.31% (95% CI 95.57–99.50%) for Chi-squared test (18.06% no-call) (Fig. 2A, D). For maternal homozygous SNPs, SPRT was 100% (95% CI 97.00–100.00%) accurate (zero no-call), while Chi-squared test was 99.29% (95% CI 95.66–99.99%) with 6.67% no-call (Fig. 2B, D). For maternal heterozygous SNPs, SPRT results were 99.12% (95% CI 94.71–99.99%) correct (17.39% no-call), compared with 96.88% (95% CI 90.83–99.32%) for Chi-squared test (30.43% no-call) (Fig. 2C, D).

Fig. 2figure 2

The performance of SPRT and Chi-squared test in detecting allelic imbalance. Bar graphs showing the percentage of correct, incorrect and no-call genotyping results in A 36 SNPs, B maternal homozygous SNPs, C maternal heterozygous SNPs across all samples tested. D Bar graph indicating the overall accuracy of SPRT and Chi-squared test in fetal genotyping. E Scatter plot displaying the accuracy and no-call rate of SPRT and Chi-squared test as fetal fraction increases. All error bars indicate 95% CI

Fetal fractions ranged from 7 to 18% in five plasma samples. The accuracy of two statistical tests in fetal genotyping remained consistent at different fetal fractions. However, the no-call rate was inversely correlated with fetal fraction, especially for SPRT (Fig. 2E). The correlation coefficient was negative 0.72 (− 0.72) for SPRT and 0.47 for Chi-squared test. Notably, Chi-squared test resulted in significantly higher no-call rate when compared to SPRT (Fig. 2E). Thus, we elected to use SPRT in subsequent experiments.

Haplotype construction and linkage analysis

We have aforementioned that NIFG is capable of detecting mutations including single nucleotide changes and small indels. However, in order to improve the diagnostic accuracy as well as detect other types of complex mutations such as inversion, repeat expansion and large indels, haplotype construction and linkage analysis were also performed to infer the inheritance pattern. This is achieved by: (1) selecting a group of SNPs that immediately flank the mutation (preferably within 200–500 kb); (2) genotyping these SNPs in the proband, parents, fetus therefore the affected haplotypes can be identified.

In fact, to detect maternally inherited variants, we analyzed SNPs that were heterozygous in mother and homozygous in father, while for detection of paternally inherited variants, selected SNPs were heterozygous in father and homozygous in mother as described by Lo et al.[7]. Taking autosomal recessive disorders as an example, two possible scenarios are considered: first, a sibling of the parent is a patient (Fig. 3A, B, C); and second, a previously born child is a patient (Fig. 3D, E, F). In the first scenario, in order to determine the affected maternal haplotype and whether it is inherited from grandfather (GF) or grandmother (GM), one of the following conditions needs to be met: (1) selected SNPs were homozygous in one grandparent and heterozygous in the other: a. SNPs homozygous in GM and heterozygous in GF were used to determine whether the affected haplotype is inherited from GF (Fig. 3B); b. SNPs heterozygous in GM and homozygous in GF were used to determine whether the affected haplotype is inherited from GM (Fig. 3C); (2) both grandparents are heterozygous for selected SNPs, and the genotype of the proband at the said SNP differs from at least one of the parents (Fig. 3A). In the second scenario, in order to determine the affected haplotype in both parents, one of the following conditions needs to be met: (1) selected SNPs were homozygous in one parent and heterozygous in the other: a. SNPs heterozygous in mother and homozygous in father are used to determine which maternal haplotype is affected (Fig. 3E); b. SNPs homozygous in mother and heterozygous in father are used to determine which paternal haplotype is affected (Fig. 3F); (2) both parents are heterozygous for selected SNPs and the genotype of the proband at the said SNP differs from at least one of the parents (Fig. 3D).

Fig. 3figure 3

Haplotype and linkage analysis for families with autosomal recessive disorders. Hypothetical families in which A, B, C a sibling of the pregnant woman is a patient and D, E, F a previously born child is a patient. In both cases, the pregnant women are heterozygous carriers of the pathogenic mutation. Different conditions show how the haplotype and linkage analysis can be used to identify the affected haplotype linked to the pathogenic mutation. A Both grandparents are heterozygous for selected SNPs; B GM is homozygous and GF is heterozygous for selected SNPs; C GF is homozygous and GM is heterozygous for selected SNPs; D Both parents are heterozygous for selected SNPs; E the mother is heterozygous and the father is homozygous for selected SNPs; F the father is heterozygous and the mother is homozygous for selected SNPs

Diagnosis of inherited monogenic disorders

The overall workflow diagram is shown in Fig. 4. NIFG is a haplotype-based method also focusing on the mutation loci, which was performed using targeted amplification combined with deep sequencing of maternal genomic and plasma DNA. The allelic ratios between the maternal genomic and plasma DNA were then analyzed by SPRT to detect allelic imbalance. A combined analysis of both the mutation loci and haplotype information was used in the imputation of fetal genotype and clinical diagnosis of monogenic disorders. For instance, the illustration for the diagnosis process of autosomal recessive disorders by NIFG is shown in Fig. 5. The selected SNPs used in the analysis can be divided into two categories including SNP I and SNP II. SNP I was defined as a group of SNPs where mother is heterozygous and father is homozygous, and SNP II was defined as a group of SNPs where mother is homozygous and father is heterozygous. The fetal genotypes could be deduced based on the results of genotyping at the mutation loci and SNPs adjacent to the mutation loci.

Fig. 4figure 4Fig. 5figure 5

A schematic for the diagnosis of autosomal recessive disorders by NIFG. Parental Haplotypes were firstly constructed accordingly to the selected SNPs linked to the mutation. SPRT statistical analysis was then performed to detect allelic imbalance. Based on all the information above, the fetal genotypes could be deduced. SNP I: to detect maternally inherited variants, selected SNPs were heterozygous in mother (A/B) and homozygous in father(A/A);SNP II: to detect paternally inherited variants, selected SNPs were heterozygous in father (A/B) and homozygous in mother (A/A)

In the study, NIFG was used to analyze the parental inheritance of the fetus in 17 families with different monogenic disorders, including X-linked recessive disorders (Hemophilia A, Duchenne muscular dystrophy, hyper-IgM type 1), autosomal recessive disorders (glutaric acidemia type I, Nagashima-type palmoplantar keratosis and Von Willebrand disease type 3) and autosomal dominant disorder (Familial exudative vitreoretinopathy).

Diagnosis of X-linked recessive disorders by NIFG

NIFG was performed for twelve families with X-linked recessive disorders and made up 70.6% of all NIFG for monogenic disorders, including ten cases of Hemophilia A, one case of Duchenne muscular dystrophy (DMD) and one of hyper-IgM type 1 (HIGM1).

Hemophilia A is an X-linked, recessive disorder due to deficiency of factor VIII (encoded by the F8 gene), which is critical for blood clotting. Mutations in F8 that eliminate or reduce its expression result in severe, moderate and mild hemophilia, respectively [29]. The incidence of hemophilia A is 1/5000 in male live births, and ~ 70% of cases are inherited [29]. In this study, we performed NIFG on DNA samples of ten pregnant women who carried F8 mutations, including point mutations, deletion, duplication and intron 22 inversion. In addition to mutation sites, we designed primers for 19 single nucleotide polymorphisms (SNPs) (MAF > 0.02, linkage disequilibrium r2 < 0.8) that were selected from a region encompassing F8 that spans ~ 1.7 Mb on chromosome X (Additional file 3: Table S3). These loci were sanger-sequenced in the mother, proband and maternal grandparents (if proband is not available). Amelogenin loci were also amplified from plasma DNA and sequenced to determine fetal gender. We performed two parallel experiments and only considered concordant results (otherwise no call). Further, linkage analysis was performed to infer the inheritance of parental haplotypes by analyzing a group of selected SNPs loci (as described above). Based on all the information above, the fetal genotypes could be deduced. We successfully identified 3 male patients, 1 female carrier and 6 normal fetuses from 10 pregnancies (Table 1, Additional file 4: Table S4), and the results were confirmed by sequencing of amniotic fluid samples. The average sequencing depth per locus in the F8 gene was 15339X.

Table 1 NIFG results for diagnosis of hemophilia A

The same processes of prenatal analysis using NIFG were also performed on the other families with X-linked disorders. In the DMD family, we detected a TCTA 4-base pair (bp) deletions/insertions (delins) located in the exon 8 of DMD gene in the proband and his mother was a carrier, and the results of NIFG revealed that the fetus was a female carrier of the mutation (Additional file 4: Table S4). In the HIGM type 1 family, the mutation in the proband was inherited from the mother carrying a missense mutation of CD40LG (c.676G > A, p. Gly226Arg); unfortunately, we found that the fetus was also a male patient of the point mutation (Additional file 4: Table S4). Selected SNPs of both genes are summarized in Additional file 3: Table S3. The average sequencing depth per locus was 18693X in the DMD gene and 42545X in the CD40LG gene. Notably, NIFG for X-linked recessive disorders provided correct diagnosis when fetal fraction was as low as 2.30%.

Diagnosis of autosomal recessive disorders by NIFG

Four families with three autosomal recessive disorders including one family of glutaric acidemia type I (GA-1), one family of Nagashima-type palmoplantar keratosis (NPPK) and two families of Von Willebrand disease (VWD) type 3 were recruited and analyzed.

Von Willebrand factor, encoded by the VWF gene, is a plasma protein that contributes to the formation of a platelet thrombus and protects FVIII from degradation via binding. Deficiency or structural defects of VWF lead to Von Willebrand disease (VWD) [30]. Our study recruited two VWD type 3 families, which is the most severe form of this autosomal recessive disorder (complete absence of VWF protein) [31]. Genetic evaluation of the proband revealed a homozygous intronic mutation (c.2547-13 T > A) in the first family, indicative of consanguineous marriage. This mutation activated a cryptic splice acceptor and caused an inclusion of 37 bp of intron 19 in the mRNA, resulting in a premature stop codon (p.Cys849Trpfs*28). For another family, we detected two heterozygous VWF mutations in the probands (c.7822C > T and c.7403G > C). In addition to the point mutations, we selected 25 SNPs (MAF > 0.2, linkage disequilibrium r2 ≤ 0.61) from a 0.88 Mb region that contains VWF gene (Additional file 3: Table S3). As shown in Table 2 and Additional file 5: Table S5, for all families, the fetus was a carrier and only the paternal mutation was detected in the fetus. The conclusions were drawn from fetal genotypes of the mutation loci and further corroborated by multiple SNPs in the linkage analysis.

Table 2 NIFG results for diagnosis of Von Willebrand disease type 3

Glutaric acidemia type I (GA-1, OMIM 231,670), an autosomal recessive neurometabolic disorder caused by biallelic pathogenic variants in GCDH resulting in deficiency of glutaryl-CoA dehydrogenase (GCDH), is one of the most common inherited metabolic disorders. Approximately 1 in 100,000 children in the world suffers from this disease [32]. In the affected family, the couple were both carriers of the pathogenic variants in the GCDH gene (c.416C > G and c.1244-2A > C) and had already given birth to a child diagnosed with GA-1. We diagnosed the fetus as a carrier of the paternal mutation (Additional file 5: Table S5). For the Nagashima-type palmoplantar keratosis (NPPK) family, the wife was a patient of NPPK carrying a homozygous disease-causing variant in the SERPINB7 gene (c.522dupT), and we detected a heterozygous pathogenic SERPINB7 mutation in her husband (c.796C > T). In order to have a healthy child, the couple attended to our center for prenatal diagnosis. However, the fetus was diagnosed as a patient by NIFG (Additional file 5: Table S5). Moreover, NIFG for autosomal recessive disorders provided correct diagnosis when fetal fraction was in the range of 4.2–13.5%.

Diagnosis of autosomal dominant disorders by NIFG

Familial exudative vitreoretinopathy (FEVR, OMIM 133,780) is a complex genetic disorder characterized by incomplete development of the retinal blood vessel [33], which serve as one of the main causes of retinal detachment and eye blindness in adolescents [34]. In this study, we recruited members of one affected family which was clinically diagnosed with FEVR type 5 (FEVR5), and a missense heterozygous mutation of the TSPAN12 gene (c.566G > A, p.Cys189Tyr) was detected in the mother. The family pedigree indicated that the case was inherited from an autosomal dominant (AD) manner. We designed primers for the point mutation and 29 SNPs (MAF > 0.2, linkage disequilibrium r2 ≤ 0.8) from a 1.7 Mb region that contains TSPAN12 gene, as shown in Additional file 3: Table S3. By analyzing the results of genotyping at the mutation and linked SNPs loci by NIFG, we identified the fetus as a patient of FEVR5 (Table 3). The fetal fraction detected in this case was 7.3%.

Table 3 NIFG results for diagnosis of FEVR type 5

留言 (0)

沒有登入
gif