A Bayesian framework to study tumor subclone-specific expression by combining bulk DNA and single-cell RNA sequencing data [METHODS]

Yi Qiao1,6, Xiaomeng Huang1,6, Philip J. Moos2, Jonathan M. Ahmann3, Anthony D. Pomicter3, Michael W. Deininger3,4, John C. Byrd5, Jennifer A. Woyach5, Deborah M. Stephens3 and Gabor T. Marth1 1Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah 84112, USA; 2Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, Utah 84112, USA; 3Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah 84112, USA; 4Division of Hematology and Hematologic Malignancies, University of Utah, Salt Lake City, Utah 84112, USA; 5The James Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio 43210, USA

6 These authors contributed equally to this work.

Corresponding author: gabor.marthgmail.com, gmarthgenetics.utah.edu Abstract

Genetic and gene expression heterogeneity is an essential hallmark of many tumors, allowing the cancer to evolve and to develop resistance to treatment. Currently, the most commonly used data types for studying such heterogeneity are bulk tumor/normal whole-genome or whole-exome sequencing (WGS, WES); and single-cell RNA sequencing (scRNA-seq), respectively. However, tools are currently lacking to link genomic tumor subclonality with transcriptomic heterogeneity by integrating genomic and single-cell transcriptomic data collected from the same tumor. To address this gap, we developed scBayes, a Bayesian probabilistic framework that uses tumor subclonal structure inferred from bulk DNA sequencing data to determine the subclonal identity of cells from single-cell gene expression (scRNA-seq) measurements. Grouping together cells representing the same genetically defined tumor subclones allows comparison of gene expression across different subclones, or investigation of gene expression changes within the same subclone across time (i.e., progression, treatment response, or relapse) or space (i.e., at multiple metastatic sites and organs). We used simulated data sets, in silico synthetic data sets, as well as biological data sets generated from cancer samples to extensively characterize and validate the performance of our method, as well as to show improvements over existing methods. We show the validity and utility of our approach by applying it to published data sets and recapitulating the findings, as well as arriving at novel insights into cancer subclonal expression behavior in our own data sets. We further show that our method is applicable to a wide range of single-cell sequencing technologies including single-cell DNA sequencing as well as Smart-seq and 10x Genomics scRNA-seq protocols.

Footnotes

[Supplemental material is available for this article.]

Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.278234.123.

Freely available online through the Genome Research Open Access option.

Received June 29, 2023. Accepted November 22, 2023.

留言 (0)

沒有登入
gif