Hospital-wide access to genomic data advanced pediatric rare disease research and clinical outcomes

Establishing a genomic sequencing ecosystem

As previously described, the CRDC was created in alignment with our institutional goals as part of the BCH Research Strategic plan and as the outcome of a Blue Ribbon committee commissioned in 201811. The goal was to establish a scalable, clinical-grade genomic sequencing platform that advances rare disease research and improves clinical care. To this end, the collaborative developed a variety of features and resources for research and clinical communities across the institution (Fig. 1). These features include integrating language for broad-use research and data sharing into consents, extensive financial support for research ES and GS, centralized data access to both research- and clinically-generated sequencing data, a comprehensive and standardized analysis platform, a mechanism to evaluate novel methods and analytics, and a network of investigators with diverse disease-specific expertise. The establishment of these resources has enabled investigators at all career levels to perform genomic research, promoted data sharing internally and externally, propelled implementation of innovative analysis and provided access to new pathways of diagnosis for individuals not able to obtain clinical testing or that had received nondiagnostic results.

Fig. 1: Key features of the CRDC.

This chart displays the six key features of the CRDC across the top and how they contribute to its research (red) and clinical (blue) goals. This figure was created in Microsoft PowerPoint.

The CRDC began offering research genomic sequencing for selected rare disease cohorts in late 2018; in the first year (Phase I), the collaborative generated ES data for 1046 affected individuals across 15 cohorts, developed consent language for broad-use research and implemented a harmonized data processing and standardized analysis platform11. Since then, additional disease cohorts were selected for funding about once a year via a hospital-wide call for applications. Cohorts were chosen based on potential for novel discoveries and scientific innovation. Moreover, the selection criteria were inclusive, with a goal to broaden the availability of genomic sequencing across all the divisions and departments and investigators at all career levels. As of early 2024, the CRDC included 45 disease cohorts led by 66 investigators from 26 departments/programs (Table 1). These disease cohorts each covered at least 5 and up to hundreds of different genetic diseases, defined by the Genetic and Rare Diseases Information Center (GARD, rarediseases.info.nih.gov).

Table 1 Overview of the disease cohorts involved in the CRDC

Since the launch of the collaborative, the process to onboard new cohorts has been streamlined, particularly at the stage of Institutional Review Board (IRB) review, which was 16% faster for the 15 most recent consents that include standardized CRDC-specific language than the first 12. There have been many improvements in the process of enrolling individuals and collecting samples. Efforts to develop methods of remote consenting and sample collection accelerated during the COVID-19 pandemic to mitigate pandemic-related restrictions on general on-site interactions with patients and research participants. The ability to consent and enroll remotely/electronically and to remotely collect buccal samples (ES only) continues even as clinics and research are permitted to occur on-site.

Additionally, in Phase II, the CRDC began supporting GS, in addition to ES, as a sequencing test. The original experimental design was to first perform ES on the proband and available family members and then reflex selected non-diagnostic cases to GS of the proband. However, in March 2022, GS began to be offered as a first-line test to all disease cohorts. Since then, usage of GS has grown, currently accounting for 35% of tests ordered. The majority of tests ordered continues to be ES, however, largely because buccal swabs and thus remote sample collection have only recently been accepted by GeneDx for GS. Overall, 70% of probands have only ES data while 17% of probands have ES + GS data and 13% have GS as the primary test (Table 1, Fig. 2A).

Fig. 2: Overview of participants and data included in the CRDC.

Overview of participants and data included in the CRDC. A Distribution of type of sequencing performed. B Distribution of age at enrollment. C Distribution of sequencing of parents. For (A–C), the pie chart includes patients for all CRDC-sequenced cohorts combined and the bar chart includes individual CRDC-sequenced cohorts with at least 20 patients. D Average number of HPO (Human Phenotype Ontology) terms collected per patient for individual CRDC-sequenced cohorts with at least 20 patients. Top: HPO terms collected manually by research teams. Bottom: HPO terms extracted from the electronic health record by Clinithink. SUDP sudden unexpected death in pediatrics, SIDS sudden infant death syndrome, ADHD attention deficit/hyperactivity disorder, DSD disorders of sex development, MIS-C multisystem inflammatory syndrome in children, HSP hereditary spastic paraplegia, ASD autism spectrum disorder, CHD congenital heart defect. This figure was created in R with ggplot39.

As of February 2024, 6308 rare disease patients and their families (13,723 individuals) that consented to a research study have had ES (100x average coverage) and/or GS (40× average coverage) performed via the CRDC, with 4653 of those families (11,150 individuals across 41 disease cohorts) consented for broad-use research purposes and data sharing (Fig. 3). These data have been harmonized in an institution-wide genomics repository with genomic and phenotypic data collected from other research projects and support sources. Additionally, a workflow was established whereby data generated from clinically-ordered sequencing was returned and harmonized in the repository, facilitating clinically-driven re-analysis and reflex to a research study. The repository thus contains 5694 families under a broad-use research consent, 4916 families under other research consents, and 3266 clinically-sequenced families not currently involved in a research study for a total of 31,168 individuals. All data were made available to the appropriate researchers or clinicians in a standardized genomics analysis platform with multiple tools for investigation including GeneDx’s Discovery Platform, Illumina’s Emedgene, a local instance of the Broad Institute’s Seqr platform23 and a gnomAD-like browser developed in-house, BCH Aggregator.

Fig. 3: Growth of the CRDC since launch.

These plots track the increase in total number of (A) families receiving genomic sequencing through the collaborative under a broad-use research consent and (B) disease cohorts enrolling such families. Enrollment slowed slightly in the early months of the COVID-19 pandemic as researchers transitioned to remote consenting and sample collection, as marked on the plot. This figure was created in R with ggplot39.

The set of CRDC-sequenced data consented for broad research (4,653 families) is described in this report and comprises data from probands who are mostly pediatric (86% with age ≤18 years at time of enrollment, median age = 11 years) (Table 1, Fig. 2B) and are 53% male and 47% female. Where possible, the biological parents of probands and other relevant family members were also consented to the study. Forty-six percent of families (2120) included both biological parents (trios) and another 28% included one biological parent (duos) (Fig. 2C). Ten percent of families included at least one other non-parental family member, 91% of whom included siblings. Seventy-two percent of CRDC probands were self–reported as White, non-Hispanic/non-Latine (compared to 67% of the overall hospital patient population)24.

Phenotypic information is collected in a centralized repository for studies involved in the CRDC via two methods. The first is manual entry of clinical information into disease-specific REDCap25,26 databases by individual research teams. Each disease cohort had an average of 1–16 Human Phenotype Ontology (HPO) terms per patient (Fig. 2D) and with an overall average across all cohorts of 5 HPO terms/patient. Many research groups also collected additional phenotype information (e.g., EEGs and MRIs) and the disease-specific REDCap databases could have hundreds of fields. The second method of phenotypic data collection is pulling from the electronic health record (EHR) via Clinithink (www.clinithink.com), a natural language processing algorithm. This method resulted in an average of 52 HPO terms per proband (Fig. 2D). All these data were then collected in a single central REDCap repository allowing for easy dissemination to various analysis tools.

Advancing rare disease research

In addition to providing sequencing support for studies that prospectively enrolled patients under a broad-use research consent, the CRDC has also supported several projects with the goal of expanding access to sequencing and other diagnostic methods. One major arm of this was providing support for ES/GS and analysis for patients already enrolled in a different study under a non-broad-use research consent (i.e., allowing only for more limited sharing of data). 1846 patients (2771 individuals) have thus been sequenced across nine disease cohorts including sickle cell anemia, orphan/ultra-rare disease, myopathies/dystrophies, neurodevelopmental disorders, and interstitial cystitis27 (Fig. 4), resulting in over 100 additional diagnoses so far.

Fig. 4: Additional projects supported by the CRDC.

A number of samples were included in additional projects performed to increase access to sequencing (ES, GS) and evaluate orthogonal methods for identifying the genetic basis of a rare disease presentation: RNA-seq, single-cell RNA-seq (scRNA-seq), long-read genome sequencing (LR-seq), high-depth exome sequencing to detect somatic mosaic variants (Deep-seq) and proteomics. This figure was created in Microsoft Excel.

The other major arm was supporting pilot projects to investigate and implement orthogonal experimental methods for identifying the genetic/genomic basis of a rare disease presentation (Fig. 4). One such project involved performing RNA-seq on over 400 samples across seven different cohorts including myopathies, pulmonary disease, severe COVID-19 and epilepsy. Data analysis is ongoing but already a few solved cases have been supported by functional information gleaned from the transcriptomic data, particularly for the COVID-19 and myopathies cohorts, which performed RNA-seq on related tissues: blood and muscle, respectively. Another method investigated was long-read sequencing with data generated for over 80 patients from seven cohorts including hearing loss, epilepsy, and nephrotic syndrome. Preliminary analysis of the data from hearing loss patients has resulted in three additional pathogenic findings undetectable with short-read genome sequencing. Other ongoing projects include testing the utility of single-cell transcriptomics, high-depth exome sequencing for identifying somatic mutations and proteomics in solving rare diseases.

Another major component of this study was creating opportunities for collaboration both within and outside the institution (Table 2). Critical to this was the incorporation of broad sharing for research use into the consent forms. In addition to the 4653 families sequenced via the CRDC, 1041 additional patients and families received genomic sequencing under a consent that allowed for such broad sharing of data. The genomic and phenotypic data for these individuals are available in a de-identified format for cross-cohort analyses in custom in-house tools including BCH Aggregator and Cohort Family Analysis. BCH Aggregator is an integrated database of sequencing data and clinical phenotypes and has a web portal for visualization and exploration. This aggregate resource allows users to investigate how many individuals have variants at a specific locus of interest, which phenotypes those individuals have, the name of the investigator who originally consented the patients and it provides statistics for phenotype-specific burden tests of variants in genes. The Cohort Family Analysis tool aggregates candidate variants produced by family-based prioritization tools across families and cohorts, provides information on variant inheritance, participant phenotype, consenting investigator, allele frequencies, protein impact of variation, protein annotations and protein structures, variant interpretations, splicing, conservation and known gene-disease relationships, and includes various filtering options. Both of these tools facilitate preparatory-to-research queries and building collaborations.

Table 2 Data and analysis platforms available for different types of queries through the CRDC

Many disease group researchers reported multiple and diverse collaborations. Internal collaborations have been integral for patient recruitment as many of the patients fall into multiple disease categories and some of the disease groups have unique combinations of phenotypes resulting in new connections between departments in the institution. 624 out of 9797 patients consented to research (6%) are enrolled in multiple research studies. Data can also be shared externally including through Genomic Information Commons28. Additionally, through the CRDC, BCH has also been established as a Matchmaker Exchange (MME)29 node with the ability to submit data integrated into one of the commonly used analysis tools, an institution-specific instance of the Broad’s Seqr23.

The CRDC has also been a critical component of the success of a recent international collaboration between four leading pediatric hospitals to investigate the diagnostic and clinical utility of rapid trio GS in infantile epilepsy30. This study recently published the results of the first 100 infants enrolled (43% diagnostic rate), 34 (34%) of whom were enrolled from BCH and supported by the CRDC. In addition to sequencing support, the BCH arm of the collaboration was able to rely on the established workflows and infrastructure to quickly get the study running. Of the published 34 cases, 44% had diagnostic findings, and this study is ongoing with 91 families enrolled at BCH (Fig. 4). The results from this rapid sequencing study have driven changes in standards of care such as offering genomic testing to patients with infantile epilepsy.

Improving clinical care

The genomics analysis platform developed by the CRDC drives a research-to-clinical loop where research genomic sequencing can have an immediate impact on clinical care (Fig. 5) because of the built-in framework to clinically confirm findings discovered by researchers and the ability for clinically generated sequencing to be easily re-analyzed or reflexed to research for further investigation. Primary variant analysis for research sequencing was performed by the research group that enrolled the participants and varied depending on the disease, researcher and analysis platform. Generally, filtering was done for population frequency, functional impact and family inheritance where applicable. Some groups used a gene list to help prioritize variants but analysis generally went beyond strict filtering on known disease genes to facilitate novel discoveries. Various annotations and filters are available via the GeneDx Discovery Platform and BCH Seqr, and AI-powered variant prioritization is available via Emedgene. Putative causative variants were then clinically confirmed via validation and classification by the CLIA-certified testing facility GeneDx and returned via the referring clinician.

Fig. 5: The research-to-clinical loop.

The research-to-clinical loop involves taking advantage of the benefits and flexibility of research studies to close the gap on unmet health needs and then evolve the standard of care by bringing the results back to the clinic. This figure was created in Microsoft PowerPoint.

As of August 2023, 1165 patients enrolled in a CRDC research study (35% of 3353 cases analyzed at that point) had a genetic finding of interest including variants of uncertain significance (VUS) and variants in candidate disease genes, which were slated for follow-up functional analysis, and variants that were clinically confirmed (pathogenic, likely pathogenic or VUS) and returned to the patient’s health record (514 cases, 15% of cases analyzed). Crucially, the ability to clinically confirm research findings is made widely available as the samples are stored in a CLIA environment and the cost of the confirmatory testing by the sequencing lab (GeneDx) is covered by the CRDC.

Alongside research sequencing, clinical sequencing is also available at our center with 4032 patients having been tested from late 2019 through February 2024 (almost all ES with GS only becoming available very recently). The uptake of clinical exomes for patients seen in the clinic increased from 24 per month in 2016–2017 to 90 per month in 2021–2023, mirroring the increase in research genomic sequencing. Since 2019, these sequencing data have also been made available for analysis in the genomics platform, supporting re-analysis by clinicians. Additionally, storing the data in a centralized database makes it available for deeper research investigations if the individuals are enrolled in a research study (1324 of patients with clinical ES are also enrolled in a research study).

To further explore the benefit of accessible research genomic sequencing, a deeper review was performed for four cohorts: epilepsy31 (522 patients), hearing loss32 (218), cerebral palsy33 (175) and peripheral vestibular disorders (32). For all groups, a fraction of enrolled patients had had non-diagnostic previous genetic testing: 40% of the cerebral palsy cohort (mix of CMA and ES), 31% of the hearing loss cohort (mostly panel and single gene tests), 26% for peripheral vestibular disorders (single gene, panel, and ES) and 19% for epilepsy (clinical panel and/or CMA). These cases were included in the studies as the opportunity to perform ES or GS presented an improvement over the previously available tests, often limited to gene panels or CMA.

Another feature of these cohorts was that many of the patients included had phenotypes not classically offered genomic testing such as ES/GS, and for whom insurance coverage for such testing is not routine. In the epilepsy cohort, 73% of patients had an epilepsy diagnosis other than developmental and epileptic encephalopathies (DEE), and had a diagnostic rate of 14% (compared to 32% for DEE). In hearing loss (HL), 45% of patients had unilateral or asymmetric bilateral HL, which historically have not been tested genetically. In this study, 20% of those patients had diagnostic results (compared to 40% of patients with symmetric bilateral HL). Similarly, for cerebral palsy (CP), 48% had non-cryptogenic CP (patients with known acquired risk factors for CP) and a diagnostic rate of 10% compared to the 42% that had cryptogenic CP (with no known risk factors) and a diagnostic rate of 39%. And for the vestibular disorders cohort, where the contribution of Mendelian variants is largely unknown, there have been few variants clinically confirmed and returned so far, but there are a number of ongoing investigations for this gene discovery-focused study.

View original article

NPJ GENOMIC MEDICINE

分享书签

0 0 0 0 0 0 0

More from this channel

Hospital-wide access to genomic data advanced pediatric rare disease research and clinical outcomes

留言 (0)