Single-cell analysis of adult human heart across healthy and cardiovascular disease patients reveals the cellular landscape underlying SARS-CoV-2 invasion of myocardial tissue through ACE2

Published dataset and patient samples

Sample collection was reviewed and approved by the Institutional Review Board (IRB) at the institution where the sample was originally collected. GSE145154 was approved by the Ethics Committee of Fuwai Hospital in Beijing, China. Tissue samples of hearts with DCM and ICM in the GSE145154 dataset were obtained from patients undergoing transplant, while these causes of DCM excluded patients: cardiac amyloidosis, cardiac sarcoidosis, viral myocarditis, giant cell myocarditis, peripartum cardiomyopathy, chemotherapy-associated cardiomyopathy, obesity, diabetic cardiomyopathy, arterial coronary disease, valvular disease, and congenital heart disease. GSE134355 was approved by the Research Ethics Committee of the Zhejiang University School of Medicine, Research Ethics Committee of the First Affiliated Hospital, Research Ethics Committee of the Second Affiliated Hospital and Research Ethics Committee of Women’s Hospital at Zhejiang University (Approval Number: 20,170,029, 2,01,80,017, 2,01,90,034, 2,01,80,15, 2,01,85,07, 2,01,87,66 and 2,01,81,85). Informed consent for fetal tissue collection and research was obtained from each patient after her decision to legally terminate her pregnancy but before the abortive procedure was performed. Informed consent for collection and research of surgically removed adult tissues was obtained from each patient before the operation. Informed consent for the collection and research of tissues from deceased-organ donation was obtained from the donor family after the cardiac death of the donor.

Publicly available single-cell RNA-seq datasets were downloaded from the Gene Expression Omnibus (GEO) [43]. GSE145154 was sequenced on the Illumina HiSeq 6000 and HiSeq X Ten platforms using 10x Genomics technology, including 2 DCM tissues, 2 ICM tissues, and 1 healthy heart tissue, with left and right ventricular samples taken from each patient for sequencing. GSE134355 was sequenced on the HiSeq X Ten platform using 10x Genomics technology, including 2 adult hypertensive patient heart tissue and 2 fetal normal heart tissue.

Integrated analysis of published datasets

The single-cell data used were the original UMI count data. Its preprocessing, quality control, normalization, and dimensionality reduction clustering were all performed using the Scanpy package (v4.0) [44]. The quality control standards were as follows: (1) Each gene must be expressed in at least 3 cells. (2) At least 500 genes were expressed in each cell. (3) The variable nfeatures and the counts of each sample were according to median ± 3*MAD (median absolute deviation) standard screening. (4) The mitochondrial gene proportion was 10% as a threshold. (5) The hemoglobin gene proportion was 1% as a threshold. The subsequent data standardization, normalization, search for hypervariable genes, and dimensionality reduction clustering were all done according to the default parameters and standard procedures of the Seurat package.

The log1p function ln(10,000 × gij + 1) and column sum were used to log-normalize (UMIs/10,000 + 1) each dataset, where a gene’s expression profile g is the outcome of the UMI count for each gene i, for cell j, normalized by the total of all UMI counts for cell j. We use the harmony-pytorch Python implementation (v0.1.1; https://github.com/lilab-bcb/harmony-pytorch/) of the Harmony scRNA-seq integration method for batch correction to integrate data between different samples, and selected the first 30 principal components and resolution = 1 for dimensionality reduction clustering [45]. Single-cell group naming was done by reading papers to collect marker genes and manually annotating them.

Differential gene expression analysis

To further analyze the differentially expressed genes among cell populations, we used the FindMarkers function in Seurat v4.0 for analysis. The selection criteria for differential genes were adj. p < 0.05, and the selection criteria for logFC were based on an earlier report [44].

Coexpression analysis across diseases and cell types

We collected single-cell sequencing data of 3 CVDs and normal cardiomyocytes. To evaluate the coexpression of ACE2 and ADAM17 in different cell types and different disease conditions, we selected cell types with more than 15 ACE2+ cells for analysis, and ACE2- cells were selected by downsampling according to clinical characteristics using the ROSE package. We employed a mixed model with a random intercept that differed for each donor to account for donor-specific effects (i.e., batch effects):

$$}_} \sim \text\text\text2 + \left(1\right|\text)$$

Where ACE2 represents the binary coexpression state of each cell (that is, double-positive versus double-negative cells), Yi represents the expression level of gene i in cells, expressed in units of log2(transcripts per 10,000 reads (TP10K) + 1), and S represents the donor from which each cell was isolated. The specific implementation used the lme4 package of R software for analysis [46].

Integrated analysis for associating ACE2, ADAM17 and CTSL expression with age, sex and cardiovascular comorbidities

We combined all scRNA-seq datasets of human left and right ventricular cells, as well as fetal samples, including the expression counts of just the above three genes, to analyze the relationships between age, sex, and cardiovascular comorbidities and the expression of ACE2, ADAM17, and CTSL. First, to refine the localization of ACE2+ADAM17+ cells, we subdivided each cell subpopulation of single cells, integrated data between different samples of the same cell type using the harmony package, and selected the top 30 main adult components and the unique resolution of each cell population for dimensionality reduction clustering [45]. Single-cell group naming was done by reading papers to collect marker genes and manually annotating them. Then, the expression levels of ACE2, ADAM17, and CTSL in different cell subpopulations were plotted according to the results of subgroup segmentation, and the subpopulation with a higher content of double-positive cells was selected to explore the relationship between ACE2+ADAM17+ cells and clinical characteristics. The data imbalance was also treated using the downsampling method in the ROSE package, and the relationship between the two was assessed using the mixed-effects model in the lme4 package [46].

$$}_} \sim }_} + \left(1\right|\text\text\text\text\text)$$

where Yi represents the expression level of the dichotomized genes, while Xi represents the different clinical features. The specific method can be found in the published literature [20].

Coexpression of ACE2 and other auxiliary protease classes

Additional proteases may play a role in the proteolytic cleavage of viral protein entry and exit. To predict such proteases, we tested the coexpression of ACE2 with each of 625 annotated human protease genes [47]. To further analyze the coexpression of ACE2 and other protease classes, we assessed the coexpression of all genes and ACE2 in three disease tissue types and in healthy tissues using a random-effects model with the lme4 package [46]. The relationships between ACE2 and the PCSK family, CTSL, ADAM17, NRP1, HMGB1, CALM1, CALM3, KNG1, AAMP, NTS, AGT, DEFA5, SLC6A19 and other proteases were investigated. The relationships between ACE2 and these proteases in different cell types were also analyzed.

Functional enrichment analysis of double-positive cells

To further analyze the functional enrichment in the double-positive cells, we selected ACE2+-ADAM17+ cells and ACE2-ADAM17 cells in different tissues for differential analysis to obtain the differential genes. The numbers of ACE2+-ADAM17+ cells and ACE2-ADAM17 cells were balanced by a downsampling method and then modeled using a random forest algorithm. The top 500 genes associated with double-positivity of cells were filtered by importance, and the intersection of the top 500 genes in different tissues and the genes specific to each tissue were calculated separately. Then, the top 10 genes ranked by the sum of importance were taken for visualization using Cytoscape software [48]. In addition, to further analyze the functional enrichment of double-positive cells, we input the common genes among the top 500 genes in different tissues for KEGG enrichment analysis, using the package clusterProfiler in R software [49]. The functional enrichment map of ACE2+-CTSL+ cells was drawn in the same way.

To identify the genes related to double-positive cells in different cell types, we found the genes related to double-positive cells in different cell types using the random effect model, sorted them according to the size of the effect value, and selected the first 12 genes using Cytoscape software [48].

Analysis of cell‒cell communications

CellChat objects were created based on the pericyte UMI count matrix of each group (DCM, ICM, hypertension, and healthy) via CellChat (https://github.com/sqjin/CellChat, R package, v.1). The difference between the cell interaction of different diseased myocardial tissues and the cell interaction of normal myocardial tissue was calculated by the CellChat package. With “CellChatDB.human” set up as the ligand—receptor interaction database, cell‒cell communication analysis was then performed via the default settings. The total number of interactions was compared against interaction strength by merging the CellChat objects of each group by the function mergeCellChat. The visualization of the differential number of interactions or interaction strength among different cell populations was achieved by the function netVisual_diffInteraction. Finally, differentially expressed signaling pathways were found by the function rankNet, and the signaling gene expression distribution between different datasets was visualized by the function plotGeneExpression [50].

Statistical analysis

All data calculations and statistical analyses in this study were done using R software (https://www.r-projec t.org/, version 4.1.2). All statistical P value values were two-sided, where differential genetic screening was considered statistically significant with a corrected P value < 0.05, and the P value standard values for the remaining statistical tests were as described in the text.

留言 (0)

沒有登入
gif