A novel functional genomics atlas coupled with convolutional neural networks facilitates clinical interpretation of disease relevant variants in non-coding regulatory elements

Abstract

Genome-wide assessment of genetic variation is becoming a routine in human genetics, but functional interpretation of non-coding variants both in common and rare diseases remains extremely challenging. Here we employed the massively parallel reporter assay ChIP-STARR-seq to functionally annotate activity of >140 thousand non-coding regulatory elements (NCREs) in human neural stem cells (NSCs) as a model for early brain development. Highly active NCREs show an increasing sequence constraint and harbour de novo variants in individuals affected by neurodevelopmental disorders. They are enriched for transcription factor (TF) motifs including YY1 and p53 family members and for the presence of primate-specific transposable elements, providing insights on gene regulatory mechanisms in NSCs. Examining episomal NCRE activity of the same sequences in human embryonic stem cells (ESCs) identified cell type differential activity and primed NCREs, accompanied by a rewiring of the epigenome landscape. Leveraging on the experimentally measured NCRE activity and nucleotide composition of the assessed sequences, we build BRAIN-MAGNET, a convolutional neural network that allows the prediction of NCRE activity based on DNA sequence composition, and which identifies functionally relevant nucleotides and TF motifs within each NCRE that are required for NCRE function. The application of BRAIN-MAGNET including its functional validation allows fine-mapping of GWAS loci identified for common neurological traits, and prioritization of possible disease causing rare non-coding variants in currently genetically unexplained individuals with neurogenetic disorders, including those from the Genomics England 100,000 Genomes project. We foresee that this NCRE atlas and BRAIN-MAGNET will help reducing missing heritability in human genetics, by limiting the search space for functional relevant non-coding genetic variation.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

RD was supported by a China Scholarship Council (CSC) PhD Fellowship (201906300026 to RD) for her PhD studies at the Erasmus Medical Center, Rotterdam, The Netherlands. KL was supported by a ZonMw PSIDER Doorbraken grant (grant 10250042110005), a Brain and Behavior Research Foundation Young Investigator award (grant 30787) and a NWO Veni grant (grant 501100003246). GR was supported by the ZonMw Veni grant 1936320. Part of this research was made possible through access to the data and findings generated by the 100,000 Genomes Project. The 100,000 Genomes Project is managed by Genomics England Limited (a wholly owned company of the Department of Health and Social Care). The 100,000 Genomes Project is funded by the National Institute for Health Research and NHS England. The Wellcome Trust, Cancer Research UK, and the Medical Research Council have also funded research infrastructure. The 100,000 Genomes Project uses data provided by patients and collected by the National Health Service as part of their care and support. Some of the analysis involved external data generated by the ENCODE and Roadmap projects, that received funding from the National Institutes of Health (NIH) (grants U01ES017166, U54HG004570, U41HG006992 and U01ES017155). The Barakat lab was supported by the Netherlands Organisation for Scientific Research (ZonMw Veni, grant 91617021; ZonMw Vidi, grant 09150172110002), a NARSAD Young Investigator Grant from the Brain & Behavior Research Foundation, an Erasmus MC Fellowship 2017, and Erasmus MC Human Disease Model Award 2018, and acknowledges other ongoing support for rare disease research from Stichting 12q, EpilepsieNL, CURE Epilepsy and the Spastic Paraplegia Foundation, Inc. Funding bodies did not have any influence on study design, results, and data interpretation or final manuscript.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Ethics committee of Erasmus MC University Medical Center gave ethical approval for this work

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

All data produced in the present study are available upon reasonable request to the authors

留言 (0)

沒有登入
gif