Reference-informed prediction of alternative splicing and splicing-altering mutations from sequences [METHODS]

Chencheng Xu1,6,7, Suying Bao2,3,6,8, Ye Wang2,3, Wenxing Li2,4, Hao Chen5,9, Yufeng Shen2,4, Tao Jiang1,5 and Chaolin Zhang2,3 1Bioinformatics Division, BNRIST, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China; 2Department of Systems Biology, Columbia University, New York, New York 10032, USA; 3Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA; 4Department of Biomedical Informatics, Columbia University, New York, New York 10032, USA; 5Department of Computer Science and Engineering, University of California, Riverside, California 92521, USA

6 These authors contributed equally to this work.

Present addresses: 7Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia; 8Regeneron Pharmaceuticals, Terrytown, NY 10591, USA; 9Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA

Corresponding authors: jiangcs.ucr.edu, cz2294columbia.edu Abstract

Alternative splicing plays a crucial role in protein diversity and gene expression regulation in higher eukaryotes, and mutations causing dysregulated splicing underlie a range of genetic diseases. Computational prediction of alternative splicing from genomic sequences not only provides insight into gene-regulatory mechanisms but also helps identify disease-causing mutations and drug targets. However, the current methods for the quantitative prediction of splice site usage still have limited accuracy. Here, we present DeltaSplice, a deep neural network model optimized to learn the impact of mutations on quantitative changes in alternative splicing from the comparative analysis of homologous genes. The model architecture enables DeltaSplice to perform “reference-informed prediction” by incorporating the known splice site usage of a reference gene sequence to improve its prediction on splicing-altering mutations. We benchmarked DeltaSplice and several other state-of-the-art methods on various prediction tasks, including evolutionary sequence divergence on lineage-specific splicing and splicing-altering mutations in human populations and neurodevelopmental disorders, and demonstrated that DeltaSplice outperformed consistently. DeltaSplice predicted ∼15% of splicing quantitative trait loci (sQTLs) in the human brain as causal splicing-altering variants. It also predicted splicing-altering de novo mutations outside the splice sites in a subset of patients affected by autism and other neurodevelopmental disorders (NDDs), including 19 genes with recurrent splicing-altering mutations. Integration of splicing-altering mutations with other types of de novo mutation burdens allowed the prediction of eight novel NDD-risk genes. Our work expanded the capacity of in silico splicing models with potential applications in genetic diagnosis and the development of splicing-based precision medicine.

Footnotes

[Supplemental material is available for this article.]

Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.279044.124.

Freely available online through the Genome Research Open Access option.

Received January 28, 2024. Accepted July 18, 2024.

留言 (0)

沒有登入
gif