Highly accurate quantification of allelic gene expression for population and disease genetics [METHOD]

Anna Saukkonen1,2, Helena Kilpinen2,3,4,5,6 and Alan Hodgkinson1,6 1Department of Medical and Molecular Genetics, School of Basic and Medical Biosciences, King's College London, London, SE1 9RT, United Kingdom; 2UCL Great Ormond Street Institute of Child Health, University College London, London, WC1N 1EH, United Kingdom; 3Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, United Kingdom; 4Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, 00014, Finland; 5Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki, 00014, Finland

6 These authors contributed equally to this work.

Corresponding authors: helena.kilpinenhelsinki.fi, alan.hodgkinsonkcl.ac.uk Abstract

Analysis of allele-specific gene expression (ASE) is a powerful approach for studying gene regulation, particularly when sample sizes are small, such as for rare diseases, or when studying the effects of rare genetic variation. However, detection of ASE events relies on accurate alignment of RNA sequencing reads, where challenges still remain, particularly for reads containing genetic variants or those that align to many different genomic locations. We have developed the Personalised ASE Caller (PAC), a tool that combines multiple steps to improve the quantification of allelic reads, including personalized (i.e., diploid) read alignment with improved allocation of multimapping reads. Using simulated RNA sequencing data, we show that PAC outperforms standard alignment approaches for ASE detection, reducing the number of sites with incorrect biases (>10%) by ∼80% and increasing the number of sites that can be reliably quantified by ∼3%. Applying PAC to real RNA sequencing data from 670 whole-blood samples, we show that genetic regulatory signatures inferred from ASE data more closely match those from population-based methods that are less prone to alignment biases. Finally, we use PAC to characterize cell type–specific ASE events that would be missed by standard alignment approaches, and in doing so identify disease relevant genes that may modulate their effects through the regulation of gene expression. PAC can be applied to the vast quantity of existing RNA sequencing data sets to better understand a wide array of fundamental biological and disease processes.

Footnotes

[Supplemental material is available for this article.]

Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.276296.121.

Freely available online through the Genome Research Open Access option.

Received October 15, 2021. Accepted June 29, 2022.

留言 (0)

沒有登入
gif