Long-read genome sequencing resolves complex genomic rearrangements in rare genetic syndromes

Clinical Presentation

Patient 1 (Family 1) is a three-year-old female who is one of four children born to a non-consanguineous family of European ancestry. She was enrolled into the Congenital Craniofacial Malformations Genetics research protocol at ~1.5 years of age due to her midline facial defects, hypertelorism, nasal bridge cyst, submucosal cleft palate, agenesis of corpus callosum, subependymal gray matter heterotopia, duplicated pituitary gland, staring spells, and conductive hearing loss. After enrollment into the protocol, speech and language assessment identified delayed receptive and expressive language skills, which prompted enrollment in speech therapy. She also started to demonstrate decreased visual motor skills, fine motor precision, and hand strength, which hindered her participation in age-appropriate play activities. She exhibited impaired upper body coordination and strength and lacked spontaneous and efficient bilateral hand use. A speech pathology report at ~2.5 years of age noted incomplete laryngeal closure that led to frequent instances of laryngeal penetration progressing to silent aspiration. The family history is unremarkable with no affected parents or siblings (Fig. 1a, Left panel).

Fig. 1: Overview of the patients and their genomic alterations.figure 1

a Family pedigree for the two families in this study (arrow indicates proband), family 1 (left), family 2 (right). b Copy number change and homozygosity plots from short-read genome sequencing data show de novo chromosome 8 CNVs in family 1 (indicated by red arrows). c PacBio coverage depth plots showing de novo chromosome 18q deletions in family 2 (indicated by red arrows).

Patient 1 had a karyotype performed at birth which reported an apparent deletion involving the terminal region of the short arm of chromosome 8 (Supplementary Table 1). This finding did not rule out the possibility of a translocation of the absent chromosome 8 material and the karyotype was designated as 46,XX,?del(8)(p23.1). A Cytogenomic SNP microarray (ARUP Laboratories) was performed to further characterize the chromosome 8 abnormality seen in the karyotype. Microarray findings reported a terminal chr8p deletion of ~6.9 Mb and a terminal chr8q duplication of ~3.5 Mb (Supplementary Table 1). A terminal chr8p deletion and a terminal chr8q duplication are typically observed in Rec8 syndrome patients16,18. However, the CNVs in the patient had breakpoints that are typically not associated with a Rec8 rearrangement. The microarray report therefore suggested additional testing to clearly determine any atypical rearrangement involving these alterations. These CNVs also lacked critical genes typically noted within Rec8 CNVs, and clinical correlation was suggested for two genes (CLN8 and MCPH1) within the chr8p deletion region that are associated with autosomal recessive conditions. The family was subsequently referred to the translational research protocol to provide diagnostic clarity around the primary genetic cause of the patient’s phenotypes.

Family 1 was enrolled in the Congenital Craniofacial Malformations Genetics research study (STUDY00001876) to identify small variants within the reported chr8p deletion region that may explain the patient features. At the time of enrollment, neurodevelopmental problems that are typically associated with Rec8 syndrome patients were not appreciated in the proband, likely due to no formal assessment of developmental delays conducted at the time. The microarray report had suggested clinical correlation with two genes (CLN8 and MCPH1) within the chr8p deletion region. CLN8 is associated with autosomal recessive neuronal ceroid lipofuscinosis-8 in OMIM (MIM#600143). MCPH1 is associated with autosomal recessive primary microcephaly 1 in OMIM (MIM#3251200). We performed genome sequencing for all six individuals in family 1 using Illumina’s short-read technology to an average mean depth between 28.23x - 44.42x (Supplementary Table 2). These data were aligned to the GRCh38 reference and variants were called as described in the methods section.

Small variants were called in all individuals in the family using Churchill32. Searching for rare damaging variants led to the detection of 986 single nucleotide variants (SNVs) and indels in total. Since the proband was the only affected individual in the family, we considered variants that were consistent with autosomal recessive inheritance as well as any that occurred de novo in the proband. We found no small variants in the proband that met these criteria and thus did not identify any compelling SNVs or indels in CLN8, MCPH1 or any other gene that may explain the patient’s features.

We next performed CNV detection using VarScan2, which uses a read-depth based approach where the ratio of normalized read depth in the proband sample compared to that in normal samples are used to assign copy number states to chromosome segments33,34. We identified 186 CNVs across all 6 individuals, of which 51 were in the proband. CNVs that affect non-coding regions, were present in a healthy individual in the family, or had borderline segment ratios (absolute seg mean <0.4) were filtered out. This led to the identification of 9 de novo CNVs (2 heterozygous deletions and 7 heterozygous duplications), all which of were segments in the terminal chr8p and chr8q CNVs that were detected in the microarray analysis (Fig. 1b; Supplementary Fig. 1; Supplementary Table 3).

Since these results were indicative of a novel Rec8-like rearrangement with atypical breakpoints, we inquired about clinical features associated with Rec8 syndrome that may have been diagnosed in the proband since enrollment in the research study. Here we found formal diagnosis of developmental delays in the proband that were not noted at the time of enrollment. These additional features made a compelling case for a Rec8-like rearrangement in the proband, which led us to investigate the exact nature of any rearrangement in this patient. We therefore performed structural variant (SV) analysis to identify any such rearrangement. SVs were detected using MANTAv1.6.0 using default parameters followed by removal of predefined blacklisted SVs35. SVs that were commonly observed in the proband and a healthy individual in the family were filtered out and the resulting SV calls were analyzed using SvAnna, an SV prioritization tool that ranks SVs using a phenotype driven approach36. SvAnna is known to report any causal SV within the top 10 ranked SVs in its output. We therefore assessed the top 20 SVs that were ranked by SvAnna (Supplementary Table 4). Here we found 18 SVs that passed visual manual review but were also observed in a healthy individual despite not being called in that individual by MANTA, and two SVs that were not observed upon visual review of the aligned reads. Furthermore, any SVs linking the terminal chr8 CNVs were not detected in the MANTA output.

Investigation of the aligned short-reads on IGV, however, detected discordant reads from the chr8q duplication breakpoint mapping to low-quality mates at the chr8p23.1 deletion site (Supplementary Fig. 2a). Although this finding was indicative of a rearrangement, there were not enough breakpoint spanning reads at the chr8q duplication breakpoint (6 soft-clipped reads out of 33 total reads) in the short-read data to support recombination of a single copy of the chr8q duplicated region to the chr8p deletion site. Additionally, we could not identify any high-quality breakpoint spanning reads at the chr8p breakpoint to confirm this rearrangement in the proband. These limitations motivated us to perform long-read genome sequencing using PacBio’s HiFi circular consensus sequencing of the patient’s sample to assess any complex rearrangement in the long-read data37. HiFi mapping of the long-read data against the GRCh38 reference genome resulted in an average coverage depth of 23× for the patient sample with a mean read length of 12,618p, and a maximum read length of 41,419 bp (Supplementary Table 5, Supplementary Fig. 2b). The coverage map of the long-read sequencing data also captured the chromosome 8 copy number changes (monosomy of terminal chr8p and trisomy of terminal chr8q) previously identified in the microarray and short-read sequencing analysis (Supplementary Fig. 2c).

SVs were called from the long-read data using PacBio’s structural variant calling and analysis tool (PBSV) on SMRT Link version 13.1.0. Variants that passed the PBSV filters were compared to an internal PBSV reference dataset generated from 14 unaffected individuals. Since 14 samples were not large enough to calculate population frequencies, we filtered out any SV present in both the proband and the reference data to generate a list of unique SVs in the patient. These SVs were filtered for rarity in large population cohorts and prioritized by patient’s clinical phenotypes using SvAnna. Unilke MANTA, PBSV assigns a BND label to breakpoints representing large copy number alterations. Due to limitations of the SvAnna software in accurately detecting the underlying alterations associated with BND calls in the PBSV output, we assessed the BND calls separately. Assessment of the top 20 non-BND SVs prioritized by SvAnna identified six SVs that affected a disease-associated gene, of which only three fit a valid mode of inheritance associated with these diseases, none of which were consistent with the clinical phenotypes observed in the patient (Supplementary Table 6). We next investigated all 50 BND calls from the unique SV list and compared these against our internal reference data, which ruled out 41 BND calls that were also noted in the reference set. Of the nine unique BND calls, five were additionally found to be present in a reference sample upon IGV review that were not detected by PBSV in the reference data. The remaining four SVs (Supplementary Table 7) included (1) a translocation involving chromosomes 7 and X that did not affect any coding region and was not found to be associated with disease, (2) an ~2.75 Mb deletion on chromosome 9 with no known disease association, and larger deletions encompassing this region were reported in ClinGen and DECIPHER databases as variants of unknown significance38,39, (3) a second deletion of ~420 kb on chromosome 9 that contained only one coding gene, FAM88B, which is not known to have any disease association in the literature, and (4) a rearrangement with breakpoints at chr8:7766094 and chr8:141711312, which supported a novel Rec8-like event involving the terminal chr8 CNVs in the proband.

Review of the Rec8-like rearrangement breakpoints using IGV showed breakpoint spanning reads with supplementary alignments spanning the two chromosome 8 CNVs in the long-read data, connecting the chr8q duplication breakpoint to the end of the chr8p deletion (Fig. 2a, b). BLAT-based alignment (to GRCh38 reference genome) of 12 representative breakpoint spanning reads (out of 31 total breakpoint spanning reads) from both the chr8p deletion and the chr8q duplication breakpoints mapped these reads to a single breakpoint at the chr8q duplication breakpoint (chr8:141,711,312) and to multiple breakpoints near the chr8p deletion breakpoint with high alignment scores (>95% identity), suggesting the presence of repeat elements near the chr8p breakpoint (Supplementary Figs. 2d, e)40. This region also exhibited a relatively high read coverage in both the short-read and the long-read sequencing data compared to the rest of the genome, which was consistent across all unaffected individuals across family 1, masking the precise breakpoint location (Supplementary Fig. 3a). To better identify the repeat sequences at the chr8p deletion region, we extracted all reads around the chr8q breakpoint and mapped them to the T2T-CHM13 reference genome, which is known to better represent repetitive regions in the genome15. These reads mapped to a single breakpoint on chr8q, and to 13 breakpoints near the chr8p deletion breakpoint region within an ~200 kb window (Fig. 2c, d). Interrogation of 1500 bases up- and downstream of each of the breakpoints at the chr8p region using RepeatMasker identified a 1033 bp LTR element (LTR5A) 109 bp upstream and a 28 bp simple repeat (AAAAC)n 1452 bp downstream of each of the 13 breakpoints (Supplementary Table 8). We also detected a 77 bp LINE element (L1MB3) 1311 bp downstream of nine of these breakpoints, supporting the repetitive nature of these elements in this region. Two LINE elements (L2c and L2b) were also detected 960 bp upstream and 361 bp downstream of the chr8q breakpoint. Unearthing any possible involvement of these repeat elements in the rearrangement will require additional work beyond the scope of this manuscript. The chr8p deletion was supported by the apparent change in sequencing coverage in the proband sample before and after this repetitive region (Fig. 2f). Coverage tracks of the long-read alignments exhibit a drop in coverage to the left of the repeat region in the proband in both the short and the long-read coverage data, representative of a heterozygous deletion, which reverts to diploid coverage as we move past this region. This change in coverage is absent from the unaffected individuals in the family (Supplementary Fig. 3a). The duplication breakpoint is also supported by a change in coverage, where the coverage increases at the duplication breakpoint, clearly defining the duplication breakpoint to a base pair resolution at the genomic location chr8:141,711,312 (GRCh38) in the proband alone (Fig. 2e; Supplementary Fig. 3b).

Fig. 2: PacBio circular consensus sequencing resolved a Rec8-like rearrangement in patient 1.figure 2

IGV view of total coverage and read alignment of PacBio circular consensus sequencing reads at chr8q duplication breakpoint, sorted by parental haplotypes (a) and the chr8p deletion breakpoint, sorted by breakpoint spanning reads (b). Colored portion of the soft-clipped (breakpoint spanning) reads shows base mismatches that spans the rearrangement. Only reads with mapping quality >30 were considered for analysis. IGV view of minimap2 alignment of chr8q breakpoint spanning reads to the T2T reference genome maps to chr8q duplication breakpoint (c) and the chr8p deletion breakpoint region (d). IGV view of long-read coverage at the chr8q duplication region (e) and the chr8p deletion region (f). g Schematic of the rec-8-like rearrangement in the proband in family 1.

The chr8p deletion region and the chr8q duplication region together encompass 206 genes in total. Twenty-two of these are associated with a disease in OMIM, none of which independently explained the proband’s condition (Supplementary Table 9). Literature reports of Rec8 syndrome patients showed that the typical Rec8 rearrangement has a history of occurring in individuals of Hispanic ancestry and are associated with a balanced inversion event within chromosome 8, inherited from an unaffected parent16,17. Analysis of the long-read data at the chr8q duplication breakpoint showed 17 out of 44 total reads as breakpoint spanning reads that map the chr8q duplication to the chr8p deletion region, showing about one-third of the reads connecting the two CNVs. In the case of an inversion associated with the terminal CNVs observed in this patient, we would expect about one-third of the reads to show an inversion event at this position as well, which was not present in our long-read sequencing data. We next sorted the reads near the chr8q breakpoint into parental haplotypes based on a heterozygous SNP at position (GRCh38) chr8:141711508 in the patient that was homozygous in the father (Fig. 2a). This approach traced the affected allele (with the soft-clipped reads) in the proband to the unaffected mother. Investigation of this breakpoint in the proband’s long read and the mother’s short read data did not provide any support for an inversion event. In fact, no breakpoint spanning reads supporting an inversion were observed in either of the parents or any of the sisters (Supplementary Fig. 4). Given the Hispanic ancestry associated with an inversion around the typical Rec8 breakpoints, it is likely that the atypical rearrangement in the patient occurred de novo without any prior inversion event in an unaffected parent, in part, due to her non-Hispanic (European) ancestry. While attempts at PCR amplification and Sanger confirmation of this rearrangement failed (Supplementary Fig. 5; Supplementary Table 10), likely due to the repetitive nature of the chr8p breakpoint, long-read data was successful in validating the terminal chromosome 8 CNVs and identifying this novel Rec8-like rearrangement in the patient. Our findings therefore provided diagnostic clarity for the clinicians and the patient’s family and identified a novel Rec8-like rearrangement in a patient of non-Hispanic ancestry (Fig. 2g).

Patient 2 (Family 2) is a two-year-old female who presented with global delays, hypotonia, and dysmorphic features including brachycephalic head shape, slightly low set ears, thickened helices, bilateral ear lobe pits, flat nasal bridge, broad and bulbous nose, deep-set eyes, and mild hypotelorism. She had a history of bradycardia and respiratory distress after delivery, and poor weight gain at six months of age. Her global delay included delayed motor and speech development. Her receptive language was more intact than her expressive language. At two years and six months of age, she spoke about 10 words but did not use them together. Her hearing and vision were normal. She started rolling at five months of age, sat at 11 months, crawled at 14 months, and started walking at two years and four months. She has been in physical therapy since she was nine months old, and occupational and speech therapy since she was two years old. Linear growth had been on the slower end of normal, tracking at the 8th percentile for height. Weight and head circumference were normal. Head MRI findings at one year and ten months of age showed patchy T2 hyperintensities in the subcortical white matter, which were nonspecific and could represent gliotic changes from old insult. There were no concerns about seizures. Chest X-rays performed in the setting of acute respiratory symptoms were normal with no congenital malformations or cardiopulmonary findings. With normal growth and no murmur on examination, cardiac malformation was unlikely, and echocardiogram was not performed. She is the only child of a non-consanguineous family of European ancestry with no similar features reported in her parents (Fig. 1a, Right panel).

SNP microarray was performed for patient 2 at 14 months of age to detect large chromosomal abnormalities. Microarray analysis reported three deletions at chr18q11.2-q12.2 (deletion 1), chr18q12.3 (deletion 2), and chr18q21.1 (deletion 3), respectively (Supplementary Table 1). Interstitial and terminal deletions of variable sizes and breakpoints on the long arm of chromosome 18 are associated with autosomal dominant chromosome 18q deletion syndrome21,22,23,24. This syndrome is characterized by highly variable clinical features, the most common of which are short stature, intellectual disability, hypotonia, hearing impairment, and foot deformities. The three interstitial deletions in the patient that are indicative of a complex rearrangement have not been previously reported in a patient with chromosome 18q deletion syndrome. The patient’s parents wanted to have another child and were interested in finding out about any risk of recurrence during future pregnancies. Screening the parents was therefore important to identify any structural variants that could explain the deletions observed in the proband. Extended peripheral blood chromosome analysis and fluorescent in situ hybridization analysis were inadequate in identifying any chromosomal rearrangement in the proband. These assays also did not identify any abnormalities in the parents, suggesting a de novo occurrence of these chr18q deletions. The family was enrolled in research to resolve any chromosomal rearrangement in the patient and to identify any alterations in the unaffected parents, which would provide diagnostic clarity and identify any possible carrier alterations in the unaffected parents that may speak toward a risk of recurrence in future pregnancies.

We consented the proband and her parents under the Rare Diseases/Genome Sequencing research protocol (IRB:11-00215) for genomic testing. We performed PacBio circular consensus sequencing on genomic DNA from all three individuals to resolve the potential rearrangement indicated by the three proximal deletions on chromosome 18. An average coverage depth of 20–27× was achieved in the long-read experiment with mean read lengths between 12,335 bp and 13,613 bp, and maximum read lengths ranging from 32,282 bp to 50,563 bp (Supplementary Table 5, Supplementary Fig. 6). HiFi mapping and PBSV analysis were run on SMRT Link using default parameters against the GRCh38 reference genome. Coverage maps generated from HiFi mapping of the long-read data successfully identified the three chr18q deletions in the patient but not in her parents (Fig. 1c).

A systematic approach at filtering and prioritization of non-BND SV calls from the PBSV output (as described for patient 1) identified three non-BND SVs within the top 20 SvAnna prioritized SVs that affected a coding transcript, of which only two were found to be associated with disease, neither of which explained the proband’s clinical phenotypes and were ruled out from further assessment (Supplementary Table 11). The BND calls from the patient sample were separated from the PBSV output and compared against the BND calls in the internal reference dataset. Out of 49 BND calls, 41 were commonly found in the reference dataset, another 4 were identified in the reference data upon IGV review that were not called by the PBSV algorithm in the reference samples. The remaining 4 BND calls represented a single structural variant that involved the three interstitial deletions identified in the microarray as well as two additional breakpoints at positions chr18:49306872 and chr18: 49306881 (Supplementary Table 12). Assessment of these breakpoints on IGV identified breakpoint spanning reads at all six breakpoints of the three reported deletions, which were supported by an apparent drop in coverage at the deletion sites and allowed precise mapping of the deletion breakpoints to a single base pair resolution (Fig. 3a, Supplementary Fig. 7). Investigation of the two additional breakpoints revealed a fourth deletion of 9 bp around an inversion site located 1.17 Mb downstream of deletion 3 (Fig. 3b).

Fig. 3: PacBio circular consensus sequencing generated high-quality breakpoint spanning reads used to infer the chr18q rearrangement in family 2.figure 3

a IGV view of the left breakpoint (left panel) and the right breakpoint (right panel) of deletion 1 at chr18q11.2-q12.2 in family 2. b IGV view of aligned PacBio circular consensus reads at the inversion breakpoint 1.174 Mb downstream of deletion 3 (left panel) and the zoomed-in view of the 9 bp deletion at the inversion breakpoint (right panel). Breakpoints from deletions 2 and 3 are shown in Supplementary Fig. 7. Long reads in the patient were grouped by SNPs inherited from a parent (indicated by arrows) in order to differentiate the parental haplotypes in the proband.

The BND calls from the PBSV output, and BLAT based alignment of a representative breakpoint spanning read from each breakpoint was in concordance about the nature of the rearrangement that was identified as shown in Fig. 4a. The inversion breakpoint is precisely mapped between chr18:49,306,873 −49,306,881 with the 9 bp deletion spanned by two types of breakpoint-spanning reads. The breakpoint spanning read on the left side of the inversion mapped to the reverse strand at the right breakpoint of deletion 2. The breakpoint spanning read on the right side of the inversion mapped to the forward strand at the right breakpoint of deletion 1. Breakpoint spanning read from deletion 1 mapped the left side of deletion 1 to the right side of deletion 3, and the right side of deletion 1 to the inversion breakpoint. The left side of deletion 2 mapped to the reverse strand on the left side of deletion 3, and the right side of deletion 2 mapped to the inversion breakpoint. The left side of deletion 3 mapped to the reverse strand on the left side of deletion 2, and the right side of deletion 3 mapped to the left side of deletion 1. A schematic of the resultant derivative chromosome 18 is depicted in Fig. 4b. Deletion 1 was precisely mapped to chr18:23,614,482–36,003,481 resulting in a 12,388,999 bp heterozygous deletion. Deletion 2 was precisely mapped to chr18:39,652,549–41,450,560, resulting in a 1,798,011 bp heterozygous deletion. Deletion 3 was precisely mapped to chr18:46,795,9421–48,132,510, resulting in a 1,336,569 bp deletion (Supplementary Table 13). RepeatMasker analysis of 1500 bases up- and downstream of these breakpoints identified several LINE, LTR, and Alu elements that may play a role in the driving this rearrangement (Supplementary Table 14). Further investigation is therefore necessary to clearly define any role of these repeat elements in the chromosomal rearrangement in the patient, which is beyond the scope of this study. Grouping the long-reads in the patient by SNPs inherited from one of the parents allowed sorting of these reads into parental haplotypes. These data demonstrated that the affected allele in the proband was inherited from the unaffected father (Fig. 3a, b; Supplementary Fig. 7). Investigation of the rearrangement breakpoints in the unaffected father (and the mother) did not show any breakpoint spanning reads indicative of any carrier alterations (Fig. 3 and Supplementary Fig. 7). Moreover, PCR amplification and Sanger sequencing across these breakpoints confirmed the rearrangement in the patient, but not in her parents (Supplementary Figs. 8, 9; Supplementary Table 10). Together, these data support a de novo occurrence of this previously unreported rearrangement in the patient that provided diagnostic clarity and suggested a low risk of recurrence in future pregnancies, which was a significant concern for this family.

Fig. 4: Long breakpoint-spanning reads unraveled a complex chromosome 18q rearrangement in family 2.figure 4

a USCS Genome Browser view of BLAT alignments of the representative breakpoint spanning reads. Track 1 shows the breakpoints of three deletions and one inversion with a 9 bp loss on chromosome 18q. Track 2 shows the chr18 segments retained, with colored arrows (in color spectrum order) to indicate how they were assembled into a derivative chromosome. Solid lines indicate preserved strands; dashed lines indicate strand inversion. Tracks 3 through 10 show the alignments of representative breakpoint-spanning reads. Each read has two high-scoring alignments: one to the region it came from, and one to a different breakpoint. Deletion 1 breakpoints (left = BP1, right = BP2); Deletion 2 breakpoints (left = BP3, right = BP4); deletion 3 breakpoints (left = BP5, right = BP6); Inversion breakpoint = BP7. BP1 read maps to BP6, BP2 read maps to BP7, BP3 read maps to BP5 on the reverse strand, BP4 read maps to BP7, BP5 read maps to BP3 on the reverse strand, and BP6 read maps to BP1. b A schematic of the genomic algebra behind the chr18q rearrangement in derivative chromosome 18 [der(18)]. Retained segments are labeled A through E.

留言 (0)

沒有登入
gif