PDIVAS: Pathogenicity predictor for Deep-Intronic Variants causing Aberrant Splicing

Abstract

Deep-intronic variants often cause genetic diseases by altering RNA splicing. However, these pathogenic variants are overlooked in whole-genome sequencing analyses, because they are quite difficult to segregate from a vast number of benign variants (approximately 1,500,000 deep-intronic variants per individual). Therefore, we developed the Pathogenicity predictor for Deep-intronic Variants causing Aberrant Splicing (PDIVAS), an ensemble machine-learning model combining multiple splicing features and regional splicing constraint metrics. Using PDIVAS, around 27 pathogenic candidates were identified per individual with 95% sensitivity, and causative variants were more efficiently prioritized than previous predictors in simulated patient genome sequences. PDIVAS is available at https://github.com/shiro-kur/PDIVAS.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study was supported by JSPS KAKENHI Grant Numbers 22J23899 and 19K07367. This study was also supported by AMED under Grant Number JP22gm4010013.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The study used ONLY openly available human data that were originally located at: http://hgdownload.cse.ucsc.edu/gbdb/hg19/1000Genomes/phase3/. and https://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh37/variation_genotype/gnomad.genomes.r2.0.1.sites.noVEP.vcf.gz.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

The PDIVAS source code, command-line interface, and predictions for all rare deep-intronic SNVs, short insertion, and deletion within genes of Mendelian disease are available at https://github.com/shiro-kur/PDIVAS. ConSplice scores and precomputed scores of ConSpliceML are available at https://home.chpc.utah.edu/~u1138933/ConSplice/. Precomputed scores of CADD-Splice are available at https://krishna.gs.washington.edu/download/CADD/v1.6/GRCh37/. The pathogenic splice-altering variants from HGMD were downloaded from the HGMD website http://www.hgmd.cf.ac.uk/ under the HGMD commercial license. Due to HGMD commercial licensing, we are not allowed to share these variants publicly. 1000 Genomes Project variants are publically available at http://hgdownload.cse.ucsc.edu/gbdb/hg19/1000Genomes/phase3/. gnomAD variants are publically available at https://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh37/variation_genotype/gnomad.genomes.r2.0.1.sites.noVEP.vcf.gz. Gene list from OMIM is available to users from academic institutions and non-profit organizations at https://www.omim.org/downloads. Gene lists from CGD are publically available at https://research.nhgri.nih.gov/CGD/download/.

https://github.com/shiro-kur/PDIVAS

留言 (0)

沒有登入
gif