Chromatin conformation capture in the clinic: 4C-seq/HiC distinguishes pathogenic from neutral duplications at the GPR101 locus

Study population

The study population consisted of the following individuals: in family 1, the propositus (F1A) and her mother (F1B) and father (F1C); in family 2, the propositus F2A; in family 3, the propositus F3A; and the sporadic case, SC1. All presented clinically as described below. All underwent 4C-seq, while F2A and SC1 also underwent HiC analysis. For 4Cseq, the normal controls were F1B and F1C and three unaffected individuals from the GEO database under the accession code GSE193114 [16, 29, 30]. The population of X-LAG subjects consisted of six affected females (S2, S6, S7, S9, S13, and S17; age at disease onset: 3–48 months) that were recruited under clinical protocols 97-CH-0076 (NIH; ClinicalTrials.gov: NCT00001595) and B707201420418 (University of Liège), as outlined previously [16, 29]. The 4C-seq and HiC datasets included in the current study are available at [29, 30].

Family 1

Individual F1A was identified prenatally following a routine amniocentesis that was performed in her 36-year-old mother (G2P1) following prenatal counseling for advanced maternal age. Neither her parents nor their families had a history of overgrowth, endocrine, or other abnormalities. Her 3-year-old sister was of normal size and had met her developmental milestones. In F1A, early intrauterine growth had been restricted due to placenta previa and there was a persistent right umbilical vein. Chromosome microarray analysis (CMA) on a SNP Cytoscan HD platform showed the presence of a small microduplication of ~ 64 kb on chromosome Xq26.3 involving the region 136,068,349–136,131,892 (hg19) that included GPR101. Neither of the parents shared this duplication on follow-up examination by qPCR. The pregnancy progressed uneventfully. At birth (39 week; C-section), she weighed 2.89 kg (10th percentile; − 1.29 SD), was 48.26 cm in length (17th percentile; − 0.95 SD), and she had a head circumference of 33.3 cm (15th percentile; − 1.02 SD). She had no dysmorphic features. Due to the CMA finding of a duplication involving GPR101, a presumptive potential diagnosis of X-LAG was made. Initial hormonal testing in the early neonatal period revealed elevated GH, insulin-like growth factor-1 (IGF-1), and prolactin levels. Over the course of her first 6 months of life, regular hormonal testing and growth measurements were performed. By 3 months of age, GH, IGF-1, and prolactin normalized, while her height (− 0.17 SD) and weight (+ 0.08 SD) were unremarkable. She underwent regular follow-up with community pediatricians and pediatric endocrinologists over the next 2 years. Her hormonal levels remained normal and she met all her developmental milestones appropriately. These clinical findings were inconsistent with an X-LAG diagnosis.

Family 2

Individual F2A, a healthy adult female, came to light during her first pregnancy. During routine ultrasound screening, multiple fetal abnormalities were noted, including a lymphatic malformation in the neck, dilated cardiac ventricles, and growth restriction. Following genetic counseling, an amniocentesis was performed. The SNP-CMA (Cytoscan HD platform) revealed a 277-kb duplication on chromosome Xq26.3 involving the region 136,111,060–136,388,326 (hg19); GPR101 was the only coding gene included in the duplication. Due to fetal malformations, the pregnancy was terminated. There was no family history of endocrine, growth, or other disorders. Individual F2A was found to have the same duplication carried by the fetus on both SNP and exon arrays (Cytoscan HD and XON Arrays). This was separately confirmed in the DNA samples from F2A derived from blood and buccal swabs using a droplet digital PCR (ddPCR) assay (Additional file 1: Fig. S1). The duplication breakpoints were mapped using high-density aCGH (HD-aCGH) and breakpoint-junction PCR. This showed a complex rearrangement at the telomeric end of the duplication within the 3′UTR of GPR101 junction, which consisted of a duplication-normal sequence-duplication pattern (DUP-NML-DUP) (Additional file 1: Fig. S2). F2A later had a normal pregnancy, and a healthy girl was found not to have the Xq26.3 duplication.

Family 3

Individual F3A was a 42-year-old man with a lifelong history of developmental delay, mild intellectual disability, short stature (< 3rd centile), brachydactyly, EEG abnormalities, bilateral cranial nerve VI dysfunction, horizontal nystagmus, and hyperintensity of the globus pallidus on brain MRI. He also had episodes of psychomotor arrest on electroencephalogram. During workup for a genetic cause, he was found to be a phenotypic male with a 46, XY karyotype, and aCGH showed a 508-kb duplication on chromosome Xq26.3 involving the region 136,028,815–136,536,734 (hg19); GPR101 was the only coding gene included in the duplication. This copy number gain was independently confirmed using a ddPCR assay (Additional file 1: Fig. S1). His hormonal profile, including pituitary, adrenal, thyroid, and gonadal axes, was normal. His mother had a similar phenotype with mild intellectual disability, EEG abnormalities, short stature (< 3rd centile), brachydactyly, and globus pallidus hyperintensity and also had the same Xq26.3 duplication. A younger sister who was affected by short stature (< 3rd centile), brachydactyly, intellectual disability, and EEG abnormalities, but with normal brain imaging, did not carry the Xq26.3 duplication. Therefore, the duplication did not segregate with the phenotype in the family. Another younger sister was of normal phenotype, had two normal children, and did not carry the Xq26.3 duplication.

To explore the functional effects of the duplications seen in F1-F3 and their roles as potential contributors to clinical phenotypes or as neutral effects, we studied chromatin contacts at the X-LAG locus using 4C-seq and HiC. We then compared these profiles to normal controls.

Sporadic case 1 (SC1)

Individual SC1 developed overgrowth beginning within the first 6 months of life. There was no family history of endocrine or growth disorders and her mean parental height was 158.5 cm. From the age of two, her growth curve rapidly diverged, and she developed worsening gigantism, measuring 130 cm in height by 3 years of age (height z-score > + 8 SD). She was diagnosed with GH and IGF-1 excess due to a pituitary macroadenoma at the age of four. She was resistant to treatment with the first-generation somatostatin analog, octreotide, and then underwent neurosurgical resection of her pituitary adenoma. Disease control was achieved by her mid-teens, but she developed permanent gigantism and her final adult height was 184 cm (+ 2.92 SD; + 25.5 cm above mean parental height; Additional file 1: Fig. S3). Screening for known germline genetic causes of pituitary gigantism including the genes AIP, MEN1, and CDKN1B was negative. As part of the discovery cohort studies for X-LAG, on clinical-grade aCGH she did not have any copy number change in the region of interest at the X-LAG locus [17]. No copy number changes involving GPR101 or other genes in the X-LAG locus were found on HD-aCGH. aCGH of her pituitary tumor DNA did not show any evidence of GPR101 duplication (data not shown). In SC1, even following extensive investigations, uncertainty remained whether the infant-onset pituitary gigantism was due to X-LAG or if she was a phenocopy. To assist in making/excluding a definitive clinical diagnosis, we examined GPR101 and the X-LAG locus by performing HiC and we compared the results to those obtained in individuals with X-LAG.

Clinical grade aCGH and HD-aCGH

Subjects underwent genetic diagnosis by testing peripheral blood DNA using clinical-grade aCGH and research-based HD-aCGH with high-density probes tiling the critical region within Xq26.3 (chrX:135,001,882–136,499,429, hg19). Breakpoint junction analysis was performed using long-range PCR followed by Sanger sequencing of the PCR products. Both methods were previously described in detail [17].

Copy number variant (CNV) analysis

CNV assays were performed by employing four TaqMan hydrolysis probes consisting of a pair of unlabeled primers and a FAM-labeled Minor Groove Binder (MGB) probe. The assays employed were specific for the following genes: RBMX (Hs02426405_cn), GPR101 (Hs01730605_cn and Hs01818174_cn), and ZIC3 (Hs02692150_cn). The VIC-labeled RPP30 (Rnase P) assay (4403326, Thermo Fisher Scientific) was used as an internal control. All assays were supplied from Thermo Fisher Scientific (Waltham, MA, USA) and were used in conjunction with a ddPCR instrument (QX200 Droplet Digital PCR System, Bio-Rad, CA, USA). The ddPCR assays were performed according to the manufacturer’s protocol. Briefly, experiments were prepared in 96 microwell plates and consisted of 22 µl reactions containing 20 ng of blood-derived genomic DNA, 11 µl 2X ddPCR Supermix for Probes (no dUTP, #1863024, Bio-Rad), and 1.1 µl each of one target gene and reference CNV assay mixes. Data were analyzed using the QuantaSoft analysis software (version 1.7.4.0917). Results are displayed as copy number data and Poisson-based 95% confidence intervals (CI). The copy number was determined by the relative relationship between the quantity of the target gene and the reference gene.

4C-seq/HiC sample preparation

Cells from cultured peripheral blood mononuclear cells (PBMCs) were prepared for chromatin fixation and nuclei extraction following the protocol we described in [16]. Approximately 2.5 × 106 fixed and isolated nuclei were snap-frozen in liquid nitrogen and used as input for HiC and 4C-seq. Experiments were conducted as singletons.

4C-seq library preparation

3C and 4C library preparation was performed as previously described [16], including detailed information about 4C-seq primer sequences for the GPR101 viewpoint, viewpoint fragment coordinates, and corresponding digestion strategies for primary and secondary restriction enzymes. 4C-seq libraries were multiplexed, employing Illumina TrueSeq adapters, and sequenced utilizing DNBseq technology to generate 100-bp single-end reads, yielding approximately 10 million raw sequencing reads per sample.

4C-seq data analysis and visualization

4C-seq read mapping to the human genome (GRCh37/hg19) and read filtering were performed as previously described [31]. Mapped reads were converted to read counts per restriction fragment of the first restriction enzyme. For visualization, all 4C-seq profiles were normalized for reads per million within the specified genomic region (chrX:135,000,000–137,000,000, hg19) and subjected to signal smoothing using a 10-fragment running window. To compare 4C interaction profiles between subjects and family controls, the normalized read counts of controls were subtracted from the subject signal. Matched controls were used for signal subtraction to control batch effects between experiments. F1A and F2A were compared to the family controls, F1B and F1C, while F3A was compared to unaffected controls retrieved from the GEO database under the accession code GSE193114 [16, 29, 30]. The resulting bedgraph files were visualized using the UCSC Genome Browser, focusing on the genomic target region (chrX:135,336,766–136,561,684, hg19) in Fig. 1 and Additional file 1: Fig. S5.

Fig. 1

Impact of inter- and intra-TAD duplications on chromatin organization at the X-LAG locus. A Schematic representation of the extended X-LAG locus (hg19, chrX:135,336,766–136,561,684), delineating the genomic position of an invariant TAD border (red hexagon). Below, the position of partially overlapping tandem duplications involving GPR101 from subjects with X-LAG (highlighted by blue boxes) [16] that traverse the TAD border (inter-TAD duplications), alongside duplications from subjects of the current study (yellow boxes) that remain within TAD boundaries (intra-TAD duplications). B and C HiC at the X-LAG locus, showing normalized contact matrices at 10-kb resolution from the X-LAG subject S7 and non-X-LAG subject F2A in a side-by-side comparison to controls. Normal TAD configuration at the locus is highlighted by red arrows. Additional chromatin interactions, induced by inter- and intra-TAD duplications, are denoted by black arrows. HiC difference maps relative to controls (D and E) depict the increase in chromatin interactions (black arrows) in subject S7 and F2A. Below, corresponding 4C-seq profiles originating from the GPR101 viewpoint (black triangle) are displayed, alongside the genomic position of the duplication and the subtraction profiles relative to control samples. Inter-TAD duplications in X-LAG (B and D) result in increased chromatin interaction of GPR101 with regions centromeric of the TAD border (neo-TAD formation). Intra-TAD duplications, confined to GPR101 and excluding the invariant TAD border (C and E), exhibit increased telomeric chromatin interactions

For the comparative analysis shown in the heatmap in Fig. 2, normalized and subtracted 4C-seq signals (subjects vs. controls) were visualized using ggplot2 [32] with genomic positions binned in 50-kb increments on the x-axis along the target region. The average score was represented using the Viridis color palette. For comparisons with 4C-seq data from X-LAG subjects, we used data from affected subjects S6, S9, S13, S7, S2, and S17, and three unaffected control subjects, as described in previous studies [16]. Data on these X-LAG subjects and controls were retrieved from the GEO database under accession code GSE193114 [29].

Fig. 2

Modulation of GPR101 chromatin interactions by inter- and intra-TAD duplications. Schematic representation of the extended X-LAG locus (hg19, chrX:135,336,766–136,561,684), delineating the genomic positions of putative pituitary enhancers (green boxes) and an invariant TAD border (red hexagon), which separates GPR101 from pituitary activity in normal conditions. Below, heatmap showing differential chromatin contacts at a 50-kb genomic bin size in X-LAG and non-X-LAG subject samples compared to controls, as inferred from 4C-seq experiments with a viewpoint located at the GPR101 promoter. The genomic bin containing the viewpoint is indicated by a black arrowhead. All X-LAG subjects exhibit similar patterns of ectopic chromatin interactions from GPR101 with the centromeric region containing putative pituitary enhancers. This pattern is absent in non-X-LAG subjects. The 4C-seq data from X-LAG subjects S6, S9, S13, S7, S2, and S17 and their respective controls (described in Franke et al. [16]) were retrieved from the GEO database under accession code GSE193114 [29]

Quantification of GPR101 chromatin contacts with enhancer regions

4C-seq normalized read counts for restriction fragments overlapping with candidate enhancer regions that we previously identified (eVGLL1-intronic, chrX:135,625,877–135,641,072; eVGLL1-distal, chrX:135,656,769–135,660,247; eARHGEF6-intronic, chrX:135,846,959–135,851,769; eRBMX, chrX:135,959,959–135,963,959; eAK055694, chrX:135,990,759–135,994,160) were extracted from the GPR101 viewpoint for three groups: subjects with duplications causing X-LAG (S6, S9, S13, S7, S17, S2) retrieved from the GEO database under accession code GSE193114 [29]; subjects with duplications but without X-LAG described in the current study (F1A, F2A, F3A); and control samples (F1B, F1C from the current study, and Control_S13, Control_S2_S7, Control_S9) [16, 29, 30].

To systematically identify PCR bias for spurious and high read counts in single fragments, a Z-score was calculated for each fragment count to determine how many standard deviations the value was from the mean (a) within the candidate region of each sample and (b) across the same restriction fragment between all samples. A Z-score threshold of ± 3 was established to identify extreme variations. Values with Z-scores beyond this threshold were flagged as outliers and omitted from further analysis. Total read count in each candidate region was averaged and displayed as a box plot showing the distribution of read counts across the X-LAG, the duplication but non-X-LAG, and control sample groups.

For statistical analysis, comparisons between groups (X-LAG dup vs. control; non-X-LAG dup vs. control; X-LAG dup vs. non-X-LAG dup) were performed using a non-parametric Wilcoxon rank-sum exact test (two-sided, 95% confidence interval). The Wilcoxon rank-sum test was chosen due to its ability to handle non-normally distributed data and small sample sizes. All statistical analyses were conducted using R version 4.0.3 (2020–10-10).

HiC library preparation

Nuclei isolated from subjects and controls were processed for 3C and HiC library construction following a previously described protocol [33] with minor modifications detailed in [34]. Final HiC libraries were multiplexed and sequenced using DNBseq technology, resulting in 100-bp paired-end reads, with approximately 400 million raw sequencing read pairs generated for each sample.

HiC data analysis and visualization

HiC data analysis was performed as previously described with minor modifications [34]. Briefly, HiC paired-end reads were mapped to the human genome (GRCh37/hg19) using BWA [35]. Hi-C pairs representing valid ligation events were filtered, and PCR duplicates were removed using the pairtools package (https://github.com/mirnylab/pairtools). Unligated fragments and self-ligation events (dangling and extra-dangling ends, respectively) were filtered out by excluding paired-reads mapping to the same or adjacent restriction fragments. The resulting filtered pairs file was converted to a TSV (tab-separated values) file, serving as input for Juicer Tools 1.13.02 Pre to generate multiresolution HiC files [36]. Specifically, the analysis utilized custom scripts (https://gitlab.com/rdacemel/hic_ctcf-null): the hic_pipe.py script first generated TSV files with the filtered pairs, followed using the filt2hic.sh script to produce Juicer HiC files.

All HiC visualization was performed using the FAN-C 0.9.23 toolkit within the genomic target region [37]. HiC contact matrices of subjects were visualized in a side-by-side comparison with controls at 10-kb resolution, employing the VC-SQRT (square root vanilla coverage) normalization method. Differences in HiC contact matrices between subject and control samples were visualized at 10-kb resolution using the “-c difference” argument.

View original article

GENOME MEDICINE

分享书签

0 0 0 0 0 0 0

More from this channel

Chromatin conformation capture in the clinic: 4C-seq/HiC distinguishes pathogenic from neutral duplications at the GPR101 locus

留言 (0)