Multiomic analyses uncover immunological signatures in acute and chronic coronary syndromes

Munich cohort: ethics and patient cohort

Munich cohort (M): Informed consent was obtained from the patients in accordance with the Declaration of Helsinki and with the approval of the Ethics Committee of LMU Munich (number 19-274). We collected blood from n = 62 patients using repetitive serial sampling and separately analyzed the different immune cell constituents. For blood collection, we used heparin-anticoagulated blood (Sarstedt, catalog number 02.1065.001). Blood processing was performed within 2 h of collection on average across the Munich and Groningen cohorts. A total of n = 125 whole blood tests, n = 122 PBMC samples, n = 246 plasma samples and n = 121 polymorphonuclear neutrophil (PMN) samples were used for analyses. In the ACS group, patients with STEMI were included and blood was analyzed longitudinally. Blood sampling was done peri-interventionally (TP1M)—during catheterization to avoid time loss, 14 h (±8 h) after intervention (TP2M), 60 h (±12 h) after the acute event (TP3M) and before discharge, about 5–8 days after the acute event (TP4M). The patients were further subdivided into those without direct reperfusion within 24 h after symptom onset (delayed myocardial reperfusion, n = 4) and patients with direct reperfusion within 24 h due to coronary intervention (acute MI, n = 24). A subgroup of patients with evidence of infection in laboratory testing who were treated with antibiotics in the clinical setting defined a subgroup with hospital-acquired infection (n = 5), which was differentiated from the sterile group with STEMI and without hospital-acquired infections (n = 19). The latter was used for comparison with the CCS group. Patients were also subdivided based on clinical outcome. For this purpose, the EF measurement was determined according to Simpson’s method in echocardiography. A comparison was made between the findings on admission and during the hospital stay (first EF value) or before discharge (second EF value) (resulting in a ΔEF). Based on these, a classification was made according to positive and stable (good outcome) and negative (poor outcome) ΔEF in the acute setting.

The CCS group included patients with an initial diagnosis of CCS based on a cardiac catheterization (lumen reduction of >50%) or coronary CT scan (>75th percentile; CCS, TP0M n = 16). Coronary healthy patients, with CAD ruled out by catheterization or CT, were included as a comparison group for the CCS group (non-CCS, TP0M n = 18). Coronary sclerosis was defined as coronary irregularities without substantial lumen obstruction (<50%).

Exclusion criteria for the Munich cohort were cardiogenic shock, age >85 years and <30 years, severe systemic diseases (chronic liver disease, active hemato-oncologic diseases, active cancer, autoimmune diseases, acute inflammatory event with a fever or CRP > 2 mg dl−1 at admission (except for patients with a subacute ACS with delayed recanalization who regularly showed elevated CRP levels as an already ongoing delayed response to the infarction) and the use of immunosuppressants at inclusion. For the CCS cohort, patients with relevant elevation of troponin T levels were also excluded.

Analysis modalities for the Munich cohort (if not specified otherwise)Clinical blood test

The clinical blood tests were performed as part of the treatment during hospitalization. We involved the following clinical biomarkers and blood cells: CK, CK-MB, troponin T, CRP, leukocytes and neutrophils.

Cytokine and chemokine assays

For the isolation of plasma, 2 × 1 ml whole blood was centrifuged at 2,000 × g (relative centrifugal force) for 20 min at 4 °C (Centrifuge 5424 R, Eppendorf AG). Afterward, the supernatant was carefully removed and pooled in a common Eppendorf reaction vessel for cryopreservation at −80 °C.

For detection and quantitation of cytokines and chemokines, samples were sent on dry ice to EveTechnologies where a Human Cytokine/Chemokine 71-Plex Discovery Assay array (HD71) was performed. Within the assay, the following biomarkers were determined: 6CKine, BCA-1, CTACK, EGF, ENA-78, eotaxin, eotaxin-2, eotaxin-3, FGF-2, Flt3L, fractalkine, G-CSF, GM-CSF, GROα, I-309, IFNα2, IFNγ, IL-1α, IL-1β, IL-1RA, IL-2, IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9, IL-10, IL-12p40, IL-12p70, IL-13, IL-15, IL-16, IL-17A, IL-17E/IL-25, IL-17F, IL-18, IL-20, IL-21, IL-22, IL-23, IL-27, IL-28, IL-33, IP-10, LIF, MCP-1, MCP-2, MCP-3, MCP-4, M-CSF, MDC, MIG, MIP-1α, MIP-1β, MIP-1δ, PDGF-AA, PDGF-AB/BB, RANTES, sCD40L, SCF, SDF-1α + β, TARC, TGFα, TNFα, TNFβ, TPO, TRAIL, TSLP and VEGF-A.

Plasma proteome analysis

The isolation and storage of the plasma were mentioned above. The plasma in vials were slowly thawed at +4 °C and mixed at a ratio of 1:5 with a proteomic buffer (2% SDS (Thermo Scientific, catalog number J22638.AE), 2.5 mM DTT (Invitrogen, catalog number P2325) in 50 mM Tris (Invitrogen, catalog number AM9820)). Afterward, the samples were immediately boiled at 95 °C for 10 min and cryoconserved at −80 °C. Plasma samples were prepared by SDS lysis, automated SP3 cleanup and tryptic digest (as also described before)55,56. Samples were measured on an Orbitrap Exploris 480 instrument (Thermo Fisher Scientific) in label-free data-independent acquisition mode while separating peptides on a 44 min gradient on a nanoEASY 1200 system (Thermo Fisher Scientific) coupled to the mass spectrometer. Raw files were analyzed in Spectronaut 14 (Biognosys57) against a spectral library that was generated from 52 fractions measured (as also described before55). A false discovery rate cut-off of 0.01 was applied, and spectra were searched against a human Uniprot database from 2018 including isoforms. For data filtering, the option Q value percentile with a fraction of 0.2 was used and global normalization using the median was applied. Further downstream analysis was performed in R. Normalized intensities were filtered for at least 80% valid values per row and column; remaining missing values were median centered and imputed using a randomized Gaussian distribution with a downshift of 1.8. Plasma proteomics was included into the MOFA to identify patterns of variance. No differential expression data were reported in the paper.

Isolation of PMNs

Initially, 400 µl of whole blood was added to a tube and 20 μl each of the Isolation Cocktail and RapidSpheres (EasySep Direct Human Neutrophil Isolation kit, STEMCELL Technologies, catalog number 19666) were added. After 5 min of incubation at room temperature, the reagent was filled up to 4 ml with PBS (Dulbecco’s phophate-buffered saline (1×), Thermo Fisher Scientific, catalog number 14190-094) + 1 mM ethylenediaminetetraacetic acid (EDTA, Sigma-Aldrich Chemie, catalog number 03690). Subsequently, the tube was placed in a magnet (EasySepMagnet, STEMCELL Technologies) for 5 min. Then, without being removed from the magnet, the contents of the tube were transferred to a new tube in a continuous motion. After 20 μl of RapidSpheres was added again, the incubation steps were repeated without PBS + 1 mM EDTA being added. After another decantation, the new tube was placed in the magnet for 5 min without the addition of RapidSpheres. The newly decanted tube was centrifuged at 320 × g for 7 min at 4 °C (Centrifuge 5810 R, Eppendorf AG). The pellet was then resuspended in 100 μl of PBS. Subsequently, 10 µl of the suspension was used to adjust the concentration to 5 million cells ml−1. For this, the dead cells were stained with trypan blue (Sigma-Aldrich Chemie, catalog number T8.154) and the concentration was calculated using a Neubauer counting chamber (LO-Laboroptik). For cryopreservation at −80 °C, cell suspension was added to RLT Plus buffer (Qiagen, catalog number 1053393) containing 1% 2-mercaptoethanol (Sigma-Aldrich Chemie, catalog number M3148) at a ratio of 1:10.

Prime-seq

For the analysis of the transcriptome of PMNs, prime-seq14, an early barcoding bulk RNA-seq method, was used. Samples were pretreated with proteinase K (Life Technologies, catalog number AM2548) followed by isolation with cleanup beads (Sigma-Aldrich, catalog number GE65152105050250; ratio, 2:1 beads per sample). DNase I (Thermo Fisher, catalog number EN0521) was used to digest the cells to make the transcriptome accessible for the process of reserve transcription. This was done by adding 30 units of Maxima H enzyme (Thermo Fisher, catalog number EP0753) and 1× Maxima H buffer (Thermo Fisher, catalog number EP0753), 1 mM dNTPs (Thermo Fisher, catalog number R0186), 1 µM template-switching oligo (Integrated DNA Technologies (IDT)) and 1 µM barcoded oligo-dT primers (IDT), and incubating for 90 min at 42 °C (reaction volume, 10 µl). After all samples were pooled, they were purified in a 1:1 ratio with cleanup beads. For the elimination of the leftover primers, exonuclease I (NEB, catalog number M0293L) was added (incubation setup: 37 °C for 20 min, then 80 °C for 10 min), followed by another purification with cleanup beads. The synthesis of the second strand of complementary DNA (cDNA) was prepared by adding 1× KAPA HotStart Ready Mix (Roche, catalog number 07958935001) and 0.6 µM SINGV6 primer (IDT) (reaction volume, 50 µl). For amplification, subsequent PCR cycles were performed: start: 98 °C for 3 min; 15 cycles: 98 °C for 15 s, 65 °C for 30 s and 72 °C for 4 min; end: 72 °C for 10 min. To repurify the sample, cleanup beads were added at a ratio of 0.8:1 beads per sample and dissolved out in 10 µl DNase-and-RNase-free distilled water (Thermo Fisher, catalog number 10977-049). Quantification and size selection of the purified cDNA were then performed using the Quant-iT PicoGreen dsDNA Assay Kit (Thermo Fisher, catalog number P7581) and the High-Sensitivity DNA Kit (Agilent, catalog number 5067–4627). For library preparation, a fivefold-lower reaction volume of the NEBNextUltra II FS Library Preparation Kit (NEB, catalog number E6177S) than recommended by the manufacturer was used. Fragmentation of cDNA was performed using the enzyme mix and the reaction buffer (reaction volume, 6 µl), and ligation was performed using Ligation Enhancer, Ligation Master Mix and a custom prime-seq adapter (1.5 µM, IDT; reaction volume, 12.7 µl). Solid-phase reversible immobilization-select beads (Beckman Coulter, catalog number B23317) were then used for a double size selection (ratio of 0.5 and 0.7). For amplification, Q5 Master Mix (M0544L, NEB), 1 µl i7 Index Primer (Sigma-Aldrich) and 1 µl i5 Index Primer (IDT) were used followed by PCR (start: 98 °C for 30 s; 13 cycles: 98 °C for 10 s, 65 °C for 75 s and 65 °C for 5 min; end: 65 °C for 4 min). After successful size selection with solid-phase reversible immobilization-select beads and a quality check, the libraries were sequenced using NextSeq (Ilumina).

The sequencing reads were processed using zUMIs pipeline using the Gencode human release version (https://www.gencodegenes.org/human/release_35.html). Only barcodes matching the expected samples were considered and exported as count matrices, both raw counts and library-size normalized ones. First, the data were checked using fastqc (version 0.11.8 (ref. 58)). Regions on the 3′ end of the fragment reading into the poly-A tail were removed by Cutadapt (version 1.12 (ref. 59)). The zUMIs pipeline (version 2.9.4d)60 was applied, filtering the data, with a phred threshold of 20 for 4 bases the unique molecular identifier and barcode, mapping the reads to the human genome with the Gencode annotation (v35) using STAR (version 2.7.3a); reads were counted using RSubread (version 1.32.4)61,62.

PBMC isolation

For isolation of PBMCs, 8 ml of whole blood was transferred to a BD Vacutainer CPTTM (Becton Dickinson, catalog number 362780), swiveled twice and centrifuged at 1,650 × g for 20 min at room temperature (Centrifuge 5810 R, Eppendorf AG). After swiveling twice, the supernatant was transferred into a 15 ml tube and a further centrifugation step with 350 × g for 7 min at 4 °C was performed. The resulting cell pellet was resuspended in 4 ml freezing medium and aliquoted. The freezing medium consisted of 45% Roswell Park Memorial Institute (RPMI) medium (VLE-RPMI 1640, Bio&SELL, catalog number BS.52551528.5) with 1% glutamine (Gibco l-glutamine (200 mM), Thermo Fisher Scientific, catalog number BS.K0283), 45% fetal bovine serum (FBS; FBS SUPERIOR stabil, Bio&SELL, catalog number FBS.S0615) and 10% dimethyl sulfoxide (DMSO, Sigma-Aldrich Chemie, catalog number D2438). For cryopreservation, samples were slowly frozen in a Mr. Frosty freezing container (Thermo Fisher Scientific, catalog number 5100-0036) at −80 °C for 24 h and then transferred to −80 °C freezers.

FACS and scRNA-seq preparation

For scRNA-seq analysis of the frozen PBMCs, an adapted thawing protocol of 10× was used63. Samples were thawed at 37 °C for 3 min. This was followed by stepwise dilution (5 × 1:1) with dropwise addition of complete growth medium. The complete growth medium consisted of 10% FBS and 90% RPMI. The sample was then filtered using a 50 µm strainer and centrifuged at 300 × g for 5 min at room temperature. The supernatant was removed to the last milliliter, and the cell pellet was resuspended in it by using a wide-bore pipet. An additional 9 ml of complete growth medium was slowly added, and the sample was split into two. A further centrifugation step at 300 × g for 5 min at 4 °C was performed. One-half of the sample was used for further processing for scRNA-seq analysis. The cell pellet was resuspended in 100 µl Fc block (BD Pharmingen, catalog number 564200; 1:50) and incubated on ice for 10 min. To label the cells, TotalSeqB anti-human hashtag antibodies (1:500; BioLegend; Supplementary Table 15) were added to the sample and then incubated at 4 °C for 30 min. To maximize the performance, TotalSeqB anti-human hashtag antibodies were pre-centrifuged at 14,000 × g at 4 °C for 10 min. Following this, the sample was washed three times by adding 5 ml of FACS buffer (PBS with 0.5% BSA (Albumin Fraktion V, Carl Roth, catalog number 8076.4)) and centrifugation at 250 × g for 10 min at 4 °C each time. After the last centrifugation step, the cell pellet was resuspended in 0.04% BSA in PBS and the concentration was adjusted to 200 cells µl−1 using a Neubauer counting chamber. Lastly, marked samples were pooled. The other half of the sample was used to prepare the FACS analysis. The sample was incubated with 200 µl Fc block (1:50) at 4 °C for 10 min. Staining of the cells was done by a 20 min incubation with an antibody master mix (1:400; Supplementary Table 15). After centrifugation at 300 × g for 7 min at 4 °C, the cell pellet was resuspended in 300 µl FACS buffer. The dead cells were stained immediately before flow cytometry with an LSRFortessa Flow Cytometer (BD Biosciences). The flow cytometry data were analyzed with FlowJo (BD, version 10.8.14). Flow cytometry allowed the analysis of large cell counts. Rare populations such as dendritic cells were captured at representative counts, allowing reliable detection of differences. The scRNA-seq captured only a fraction of these cells but was suitable for the performance of unsupervised immune cell profiling of more frequent immune cell subsets, but not for the detection of differences in geometric averages of rare ones. Rather, scRNA-seq enabled an unsupervised sub-clustering of more common immune cells not identified by canonical surface-based flow cytometry panels due to limitations in the number of antibodies used. The CLR-transformed flow cytometry data were hence provided as a baseline characterization of the sub-cohorts. For the MOFA-model-based deep phenotyping, the gene expression profiles of the fine-grained scRNA-seq sub-clusters were included. The geometric averages of immune cell subsets were not included into the MOFA model but were used for an initial screening of differences in canonical immune cell subsets as a baseline characterization. Previous CLR transformation was performed as described in the scRNA-seq compositional analysis section.

scRNA library preparation and sequencing

For single-cell sequencing, libraries were prepared using the Chromium Next GEM Single Cell 3′ Reagent Kit v3.1 (CG000206 Rev D with with all mentioned components) from the 10x Genomics protocol. Barcode-based multiplexing with TotalSeqB anti-human hashtag antibodies (Supplementary Table 15) was performed to reduce artefacts associated with batch variation. According to the manufacturer’s instructions, the gel beads in emulsion were first prepared to obtain cDNA with reserve transcription. After purification of the cDNA, amplification and size selection were performed. After final quantification and quality control, the gene expression and cell surface libraries were constructed for sequencing. Sequencing was subsequently performed by IMGM Laboratories using Illumina NovaSeq 6000.

Bioinformatics analysis of the CS scRNA-seq datasetPreprocessing SC data preparation: cellranger

After sequencing, the FASTQ files for the gene and cell surface libraries were processed using the cellranger count pipeline (chemistry: Single Cell 3′ v3; pipeline version 3.1.0). Each sample was mapped to the human reference genome (GRCh38; version: 3.1.0). The library and reference files were created according to the 10x Genomics instructions and example files for Antibody Capture with TotalSeq B (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/feature-bc-analysis#feature-ref (17 January 2023)).

The pipeline quantified each feature (genes plus antibodies) in each cell and generated quality control summaries and feature barcode matrices for each of the 14 libraries (Supplementary Table 4).

For further analysis, we took the ‘filtered_feature_bc_matrix.h5’ of each library and split it up into two separate anndata objects: one containing the gene expression and one the antibody capture counts.

Demultiplexing and doublet identification

For the demultiplexing of the scRNA-seq data, we took the antibody capture count anndata objects of each library, converted them to Seurat objects, normalized the counts with CLR transformation and applied the ‘HTODemux’ function of the Seurat package (version: 4.1.1) as described in the vignette with the default 0.99 quantile threshold to classify the cells as positive or negative for each hashtag oligo. Cells that have been classified as positive for more than one hashtag oligo have been annotated as doublets.

Cell quality control and filtering

In a next step, we transferred the cell annotation results from the demultiplexing to the gene expression anndata files and applied some cell quality control criteria and filtering based on the gene expression counts to remove low-quality cells. These steps were done for each library separately.

In a first basic filtering step, we kept only cells that have counts on at least 200 genes and genes that have counts in at least 3 cells. Furthermore, we combined the percentage of mitochondrial gene count (pct_counts_mt), number of genes by count (n_genes_by_counts) and total count (total_counts) criteria to filter out further cells. We only kept cells that have:

n_genes_by_counts < 5,000 ∩ total_counts < 20,000

n_genes_by_counts > 500 ∩ pct_counts_mt < 15

Subsequently the data were normalized (10,000 counts per cell) and log transformed (log1p) using the scanpy toolkit64 1.8.1 in Python v.3.9.6. Furthermore, we excluded mitochondrial and ribosomal genes as they were not of interest for the analysis.

Data integration, clustering and cell-type annotation

To get a joint embedding of the complete dataset and correct for potential batch effects between the libraries, we took the processed data from the quality control and applied the Scanorama method (scanorama.correct_scanpy, v 1.7.1) using 2,000 highly variable genes, batch-size parameter of 2,000 and default parameters. This returned a Scanorama-corrected count matrix and a joint embedding. We compared the data integration before and after applying Scanorama by inspecting the uniform manifold approximation and projection (UMAP) plots before and after (Supplementary Fig. 14a–c) and by calculating the local inverse Simpson’s index (LISI) score (compute_lisi) with the corresponding function of the LISI R Package (https://github.com/immunogenomics/LISI).

Subsequently, we used the Scanorama embedding as input for the computation of a neighborhood graph (scanpy.pp.neighbors, n_neighbors=10, n_pcs=50) and the subsequent clustering of the cells using the Leiden algorithm (scanpy.tl.leiden; default parameters).

We found 18 different clusters that we annotated manually by looking at the expression patterns of PBMC marker genes selected based on literature research and calculating differentially expressed genes between the clusters using a Wilcoxon rank sum test with Benjamini–Hochberg adjustment as implemented in the scanpy framework (scanpy.tl.rank_genes_groups). With this strategy, we could annotate all 18 clusters to all common major peripheral blood mononuclear immune cell types (Supplementary Fig. 2a,b).

Compositional analysis

To investigate compositional changes of the cell-type clusters between the different patient groups and timepoints, we determined the percentage of cells that have been assigned to the different cell-type clusters for each patient and timepoint separately (for each patient and timepoint: amount of cells belonging to the cluster/total amount of cells) and adjusted them with CLR transformation. Adjusted values were then analyzed using ordinary one-way ANOVA with correction for multiple comparisons by Dunnett’s test (*P ≤ 0.05; **P ≤ 0.01).

Multiomics data integration: MOFAData harmonization and integration

For the integrated and combined analysis of all the different data sources (single-cell data, cytokines, neutrophils, plasma proteomics and clinical values), we applied several preprocessing and normalization steps separately on the features of the different types of data to make them comparable and adjust the distributions.

CS scRNA-seq data

We applied the pseudo-bulk approach to summarize single-cell data on the level of cell-type (cluster) specific gene expression per sample because all other omics data were measured on the bulk level. To this end, we calculated for each of the identified 18 cell-type clusters for each sample (=patient and timepoint) the mean counts across all cells. Afterward, we adjusted the gene counts of each sample in each cluster with a scaling factor so that each sample has the same amount of counts across all genes to account for technical differences in sequencing depth between the samples.

To ensure that we only consider reliably expressed genes, we applied additional filtering steps on the genes and clusters:

We excluded clusters 14–18, which had only very low numbers of cells per patient and timepoint (mostly less than 10 cells).

We filtered out genes based on the total number and percentage of cells that expressed those genes in the corresponding cluster, keeping only genes that fulfill one of the criteria below:

Percentage of cells expressing gene > 50 ∩ total number of cells expressing gene > 1,200

Percentage of cells expressing gene > 40 ∩ total number of cells expressing gene > 3,000

The thresholds were chosen considering that genes should be detectable in a high number of cells and in several samples, but at the same time, a considerable number of genes for each cluster should be kept.

After filtering, we log transformed the count values and applied quantile normalization as further normalization steps to align the distributions of gene expression levels between the samples.

This resulted in 11,831 features (which correspond to genes) across all the different clusters (ranging from 315 for cluster 13 and 2,159 for cluster 4), which we used as input for the different cell-type cluster dimensions from the single-cell data for the MOFA (Supplementary Fig. 3a).

Cytokines

To prepare the cytokine data for integration with the other datasets we set ‘OOR’ values to 0 and log transformed the values to adjust the distributions after adding a pseudo-count of 1 to all values. Furthermore, we excluded cytokines which have valid measured values in less than 20% of the samples. In total this resulted in 65 different cytokines which have been used as input features for the integrated analysis (Supplementary Fig. 3a).

Neutrophils

As input features from the neutrophil dimension, we took the umi exon counts from the prime-seq and applied the processing steps below to align the reads with the scRNA-seq data, adjust for potential technical effects and strictly remove samples and genes with low-quality reads.

In a first step, we adjusted the gene names and mapped them from ‘ENSEMBL’ gene ids to ‘SYMBOL’ gene ids. Then, we filtered out ribosomal and mitochondrial genes as we also excluded them in the scRNA-seq data preprocessing, and they are not relevant for the analysis. Furthermore, we excluded genes that are not expressed in at least 80% of the samples and removed samples that do not have reads in at least 90% of the remaining genes. In the next step, we adjusted for differences in sequencing depth between the samples and normalized the counts with a scaling factor so that the sums of reads for each sample across all genes are the same. Then, we logarithmized the resulting counts. Finally, we decided to keep only highly variable genes, so we removed all genes whose variance lies below the 25% quantile of the variance distribution. This resulted in a total of 892 genes measured on 92 samples that are considered as input features for the neutrophil dimension. As for the scRNA-seq data, we applied quantile normalization to the counts in a final normalization step (Supplementary Fig. 3a).

Plasma proteomics

For proteomics, we used the same preprocessing and normalization steps as described in the previous ‘Plasma proteome analysis’ section and took the resulting normalized and median-centered intensities measured for 490 different proteins as input features of this dimension (Supplementary Fig. 3a).

Clinical values

As input features of the clinical data dimension, we used the measured CK, CRP, CK-MB and troponin values and log transformed them (Supplementary Fig. 3a).

Model training

After these individual preprocessing steps, we had in total 13,382 features across 18 different dimensions (referred to as views throughout the paper) resulting from (1) clinical values, (2) cell-type cluster 1–14 of the scRNA-seq data, (3) cytokines, (4) plasma proteomics and (5) neutrophils (Supplementary Fig. 3a). We applied feature-wise quantile normalization onto the quantiles of the standard normal distribution for all data types.

Then we trained the MOFA model using the R/Bioconductor package MOFA2 (version 1.2.2) with maxiter parameter 50,000 to ensure convergence and 20 factors and the remaining default parameters. The number of estimated factors was chosen to balance the trade-off between explained variance and low number of factors. Also, we tested the influence of the specified number of factors on the model results by running alternative MOFA models with 5, 10, 15 and 25 factors and comparing them with the 20-factor model. We found that especially the first five inferred factors are not substantially affected by the choice of the number of factors (Supplementary Figs. 3a and 14d and Supplementary Table 16).

To evaluate the effect of the clinical features, we trained a second MOFA model excluding the 4 clinical features with in total 13,378 features and compared the resulting factor values and feature weights to the original model (Supplementary Figs. 7 and 8a,b).

Downstream analyses Gene set enrichment analysis: pathways

On the feature weight matrix resulting from our trained MOFA model, we conducted pathway enrichment analysis for the first five inferred factors of the MOFA model using the gene set annotations from the REACTOME19 and KEGG65 databases. We tested all pathways belonging to the ‘Immune System’ category in REACTOME (n = 191) and pathways that are classified as ‘Immune system’ or ‘Signal transduction’ pathways in KEGG (n = 52).

To test the enrichment of the pathways across all our data input views, we generated an extended pathway gene annotation set for those pathways in which a feature (consisting of data dimension, and gene and protein code) was considered to belong to the pathway if the gene and protein code maps to the genes annotated to the pathway. Features included in the MOFA but not within the pathway were taken as background set. To map the gene and protein codes to the gene-set annotations in KEGG and REACTOME (reacome.db, version 1.76.0; ReactomePA version 1.36.0), we used the bitr function from the clusterProfiler package (version 4.0.5) to convert them to ENTREZID.

We removed all the pathways for which we had included less than 20% of the total amount of genes annotated to the pathway in our feature set and ran the enrichment analysis using the ‘run_enrichment’ method implemented in the R/Bioconductor package MOFA2 (version 1.2.2) with set.statistic parameter ‘rank.sum’ and default parameters otherwise. We ran the enrichment separately for features with only positive or negative weights and jointly across all features.

Pathways with an adjusted P < 0.05 (Benjamini–Hochberg adjustment) have been considered to be significantly enriched.

Cell–cell communication

To analyze the potential axes of cell–cell communication between different cell types, we used the previous knowledge about potential ligand–receptor–target interactions of the NicheNet21 resource collected in the nichenetr package (version 1.1.0) and loaded the provided ligand–receptor network and ligand–target matrix66. Based on the classifications in those networks, we identified ligands, receptors and potential targets among the 13,382 features included in our integrated dataset resulting from the ‘data harmonization and integration step’. We calculated Spearman correlation between all identified ligand–target pairs within this dataset.

For the further analysis of the calculated ligand–target correlations in combination with the corresponding regulatory potential score, we only considered ligand–target pairs:

Between the ligands and targets of different cell types (for example, between monocytes and T cells) and different views (for example, between cytokines and the different cell-type clusters)

Where we have reliably measured a receptor in the target cell-type cluster to which the ligand might potentially bind as specified by ligand–receptor network provided by the NicheNet21 resource to affect the target gene (in case the target belongs to one of the cell-type cluster views from the scRNA-seq data). We consider a receptor gene to be reliably measured in case it fulfills one of the thresholds below:

Percentage of cells expressing the receptor gene in the cell-type cluster > 30 ∩ total amount of cells expressing the receptor gene in cell-type cluster > 600

Percentage of cells expressing the receptor gene in the cell-type cluster > 10 ∩ total amount of cells expressing the receptor gene in the cell-type cluster > 1,200

Subsequently, we focused on pairs with high correlation and regulatory potential scores where the target gene has a high feature weight on the analyzed MOFA factor.

For the lagged analysis of ligand–receptor target gene interactions, we applied the same analysis with the modification that we calculated Spearman correlation between the identified ligand–target pairs based on mapping lagged ligand expression to the corresponding target gene expression for ACS samples. This resulted in the following mapping across timepoints for each patient that was then used for the calculation of Spearman correlation and further processing as explained above:

Ligand gene

TP1M

TP2M

TP3M

Target gene

TP2M

TP3M

TP4M

Predictions

To evaluate the prediction potential of our MOFA factors to distinguish our different patient groups, we calculated ROC curves contrasting the prediction power of the inferred factors to established clinical markers.

For Factor 4 predictions, we only considered factor values from samples measured at TP1M that could be classified to have a ‘good’ or ‘poor’ outcome. We compared the prediction potential of the factor values to the value of the clinical markers (CK, troponin, CRP) for those samples at TP1M. For the benchmarking against the prediction power across the complete time course of the clinical values, we took the maximum and mean values of those markers across all measured timepoints. In both cases, we scaled the clinical values as well as the factor values to be in a range between 0 and 1 and used them as input for the ROC curve calculation giving the probability of a sample being classified as ‘good’ versus ‘poor’ outcome.

Replication in the validation cohort: Groningen dataset

To validate our approach and findings, we used an independent second external dataset including n = 24 patients measured across three different timepoints (TP1G, at the heart catheterization; TP2G, 24 h after admission; TP3G, after 6–8 weeks) and a control group (TP0G, n = 31) contained within the Groningen study and described in detail in the original publication12. Data collected in the Groningen cohort were part of the CardioLines biobank67 study that investigated factors influencing diagnosis and treatment outcomes. It was approved by the UMCG ethics committee (document number METC UMCG 2012/296), and all patients provided informed consent68,69.

Further specifications of the dataset and the processing can be found within the corresponding paper12. The data in the Groningen study12 were measured with two different chemistries: v2 10X chemistry and v3 10X chemistry, which showed technical differences in gene expression profiles between the samples that were prepared with different chemistries. Therefore, a separate processing of both datasets was necessary. In our replication, we focused on samples measured with the v2 10X chemistry as this cohort included a higher number of samples (v2, n = 55; v3, n = 21). The V3 10X chemistry cohort did not include a sufficient number of samples that could be divided into poor or good outcome groups and would have therefore been underpowered. Classification into long-term good- and poor-outcome groups in the Groningen cohort was performed to subdivide the cohort into two outcome sub-cohorts based on long-term development of EF. Based on the ΔEF from echocardiography results (during the hospital stay and follow-up), a classification was made according to positive (good outcome, n = 7) and negative or stable (poor outcome, n = 5) long-term values.

For the replication, in a first step, we evaluated the alignment of the different strategies for cell-type annotations that were applied in the two different studies. Subsequently, we harmonized annotations between both datasets and applied the Munich data preprocessing strategy on the Groningen data. MOFA Factor 1 (IC) could not be evaluated in the Groningen cohort as this dataset does not include the differentiation of the control group into ‘CCS’ and ‘non-CCS’ patients.

Alignment of cell-type annotations

In the Groningen study, cell-type annotation was done using the automated Azimuth method from the R Azimuth library (version 0.4.6; for more details, see paper12). As our data were processed and annotated in a different way, we first compared clusters and annotations resulting from our study to those of the Groningen study. For this, we ran the preprocessing and automated annotation strategy as described in the Groningen study12 on our data and compared the resulting annotations of the single cells with the annotations resulting from our initial clustering and manual annotation strategy (Supplementary Fig. 4). In general, our clustering and the automated azimuth annotation resulted for some cell types in more granular (for example, B cell cluster 10 would be distributed across ‘B naive’, ‘B memory’ and ‘B intermediate’ azimuth annotations) or more aggregated annotations (for example, CD14high monocyte clusters 4, 6 and 7 would all be aggregated as CD14high monocytes), but on a more aggregated level, annotations aligned well except for some T cell clusters (namely, cluster 1 CD8+ T cells, cluster 11 CD4+ T cells and cluster 5 CD4+ T cells).

Rerun of MOFA with harmonized cell-type annotations

As described in the section ‘Alignment of cell-type annotations’, the Munich and Groningen datasets initially were annotated using two different strategies. To be able to apply a model trained on the Munich cohort to the data from Groningen, the same harmonized definition of cell types and features needed be used in both datasets. Thus, we applied the Azimuth automated cell type annotation approach that was used in the Groningen cohort to the data from the Munich cohort. This led to a new annotation with a different level of granularity, which might impact on the downstream MOFA. To assess this effect, we ran the same MOFA as outlined above on the Munich data with the new Azimuth annotations (Supplementary Fig. 5a). We evaluated whether the resulting factors were able to capture the same patterns as found with our original strategy and whether factor and feature weights of the newly inferred factors and the factors presented in the paper aligned well by correlating them (Supplementary Figs. 5b and 6a,b). Overall, the same patterns presented previously were also visible with the alternative annotations (Supplementary Fig. 5b) and the inferred factor and feature weights of the presented factors were highly correlated (|cor| > 0.8; Supplementary Fig. 6a,b).

Processing of the Groningen scRNA-seq data

In the next step, we applied the preprocessing steps as described in ‘CS scRNA-seq data’ for the MOFA analysis also on the Groningen scRNA-seq dataset resulting in normalized pseudo-bulk-aggregated features per annotated azimuth cell type in the Groningen data.

As the expression values of the genes were notably lower in the Groningen dataset, we made some minor adjustments to the requirements with regard to the percentage of cells expressing a gene and the total amount of cells expressing a gene to get a comparable set of features as in our data:

Percentage of cells expressing gene > 30 ∩ total amount of cells expressing gene > 1,000

Percentage of cells expressing gene > 20 ∩ total amount of cells expressing gene > 2,500

After applying these preprocessing steps on the Groningen data, we had in total 6,353 features across the 13 different dimensions (azimuth cell types 1–13).

Factor 2: time pattern replication

To replicate Factor 2 on the Groningen dataset in a first step, we mapped input features from the Munich dataset to the Groningen dataset and kept only features available in both datasets after the preprocessing (therefore, cytokine, clinical, proteomics and neutrophil features as well as some genes from the scRNA-seq dataset not within both datasets were removed). We used the resulting feature-weight matrix for features from the well-aligned cell types (B cell, CD4TCM, cDC2, CD16Mono, CD14Mono, NK) from the Munich azimuth MOFA estimation (WMU) (Supplementary Fig. 5) and calculated the right inverse to then apply it on the normalized input data (YGR) from the Groningen cohort to infer the corresponding sample factor matrix (ZGR) for the Groningen cohort as shown below. The pattern across timepoints of the resulting sample factor matrix (ZGR) was then compared with the pattern given within the Munich dataset.

$$^}=^}^}\right)}^\;}\;})}^=^\mathrm}^\mathrm\right)}^$$

GR, Groningen cohort; MU, Munich cohort.

Factor 4: prediction replication

For Factor 4, the main goal of our replication was to evaluate the potential of top-ranking features on Factor 4 to predict the outcome already at an early stage (TP1). As Factor 4 is derived based on the pattern across all four timepoints measured in our data and top-ranking Factor 4 features are characterized not only by the variation at TP1 between ‘good’ and ‘poor’ outcome samples but also by the variation across the different timepoints, we chose to add another step to our analysis to identify those features within the top-ranking features that have high prediction potential only looking at their TP1 values. For this, we chose to select the intersection of the top 280 features of IAR (corresponding to roughly 20% of the total amount of features) between the MOFA models estimated on the Munich data with the manual and the harmonized automated azimuth annotations. Subsequently, we trained a lasso model (logistic regression) with 8-fold cross validation using the cv.glmnet function of the R package glmnet (version 4.1.6; alpha=1; family = ‘binomial’; nfolds = 8; other, default parameters) on the Munich data taking as input only the value of those features at TP1M and predicting the outcome of the samples. We selected the best model given by ‘lambda.min’, the value of lambda that minimizes the cross-validation error.

Then, we applied this trained model on the same set of features with their values at TP1G on the Groningen dataset considering this dataset our holdout test dataset and evaluated prediction performance for those samples calculating AUC values.

In addition to the lasso-selected top-ranking feature set of Factor 4, we also evaluated the prediction potential of features that we highlighted in the paper, based on their potential biological mechanisms, namely NK cell features CD74, TXNIP and GZMB. Again, we trained a logistic regression model (glm function; family = binomial (link = ‘logit’)) for these features on the Munich data. Subsequently, we applied this model to the Groningen cohort for the evaluation of the performance on the validation set.

In vitro treatment of healthy PBMCs with heparin, aspirin and prasugrel with subsequent scRNA-seq

Human PBMCs were isolated from healthy donors as described above and freshly processed for scRNA-seq. A total of 2 × 105 PBMCs were seeded in a 96-well plate and incubated with heparin (1.5 IU ml−1, Ratiopharm), prasugrel (0.012 mg ml−1, Substipharm SAS) and acetylsalicylic acid (0.1 mg ml−1). After 3 h of incubation at 37 °C and 5% CO2, the cells were washed and subsequently treated with Fc-Block (Human BD Fc Block, BD Biosciences, catalog number 564200) for 10 min at 4 °C. Afterward, the respective hashtag master mix (final antibody concentration, 1:100) was added, and the cells were incubated for 30 min at 4 °C. Subsequently, 5 ml of buffer (0.5% BSA (Albumin Fraktion V, 8076.4, Carl Roth) plus DPBS (4190-094, Thermo Fisher)) was added and the mixture was centrifuged at 250 × g for 10 min at 4 °C. This washing step was repeated twice. After the last centrifugation step, the pellet was resuspended in 50 µl buffer (0.5% BSA (Albumin Fraktion V, Carl Roth, catalog number 5642008076.4)). Cell counts were adjusted to 1,000 cells µl−1 using a Neubauer counting chamber and then pooled. A total of 60 µl of the single-cell suspension was used for library preparation (input, 60,000 cells).

Bioinformatic analysis of the in vitro scRNA-seq dataset

For the analysis of the effect of medication on the identified MOFA factors and the underlying expression changes of individual genes, we preprocessed the in vitro (IV) scRNA-seq data in the same way as described

留言 (0)

沒有登入
gif