scDOT: optimal transport for mapping senescent cells in spatial transcriptomics

We developed an optimal transport (OT) method for mapping scRNA-Seq data to spatial transcriptomics data. The method, illustrated in Fig. 1, performs iterative computations for cell type deconvolution and cell-to-spot spatial mapping, resulting in the generation of the coupling matrix \(\gamma\) as an upstream integration outcome. This coupling matrix is then utilized to infer the cell-to-cell spatial neighborhood graph by aligning cells to spots with known spatial coordinates.

Fig. 1figure 1

Method workflow: Step 1: scDOT takes gene expression profiles from a scRNA-seq dataset and a spatial transcriptomics dataset as inputs. Additionally, cell type information for cells in the scRNA-seq data and spatial coordinates for spots in the spatial transcriptomics data are provided. scDOT simultaneously and in parallel learns the cell type fraction of each spot (deconvolution task) and the mapping between individual cells in the scRNA-seq data and individual spots in the spatial transcriptomics data (spatial reconstruction task). Step 2: The resulting mapping matrix between cells and spots is then utilized to construct the cell-cell spatial neighborhood graph, where cells are connected if they are in close physical proximity

scDOT efficiently reconstructs individual cells to their spatial origins

We first tested scDOT on two simulation datasets where ground truth is known (Methods). The outcome of reconstructing single-cell data, i.e., the coupling matrix \(\gamma\), when using simulation dataset 1 reveals that it successfully recovers the spatial origins of a high fraction of cells (56% to 76%, depending on a predefined threshold to determine high probability). \(\gamma\) represents probabilistic couplings and so a specific cell can be mapped to several location with different probabilities (which sum up to 1). We found that in most cases the distribution \(\gamma _\) exhibits is extremely heavy-tailed and places a disproportionately high amount of probability densities at 0. We therefore defined a high probability of associating with a location based on distribution properties (99th, 95th, 90th quantile, or the 75th quantile (the third quantile) plus 1.5 times the interquartile range (IQR) (Turkey’s fences)). Obviously, the stricter the threshold, the fewer cells that are correctly matched. However, even for a very high cutoff, we find very large percentage of correct matches (70% of cells at a threshold above the 90th quantile and 56% of cells at a threshold above the 99th quantile when using synthetic data 1). However, the slower decay of reconstruction results due to a more strict threshold is desirable and can be achieved through a heavier tail in the distribution \(\gamma _\).

In addition, previous studies show that cell type deconvolution methods tend to miss rare cell type [10]. In contrast, when using OT, we are able to map rare cell types to their spatial origins (Fig. 2b). In our simulation data, four types of cells can be classified as rare: 2-Mesothelium and Submucosal Secretory have only 1 cell each, Myofibroblasts has 2, and Fibromyocytes has 7. The boxplots indicate that our approach successfully assigned all these rare cell types to their correct spatial positions.

Fig. 2figure 2

Performance on synthetic datasets. a OT results of simulation datasets 1 and 2 demonstrate that by using different thresholds to define a high probability, we can assign nearly 80% of cells to their spatial origin. scDOT was benchmarked against two other methods: Novosparc, a spatial reconstruction method based on Gromov-Wasserstein distance, and Random Sinkhorn, a naive method that learns the optimal transport coupling with a random cost matrix. The results demonstrate the superior performance of scDOT in all cases. b Detailed results of simulation data 1 (with a threshold higher than the 3rd quantile plus 1.5 times the IQR) highlight the effectiveness of scDOT and spatial mapping methods in general for rare cell types. The boxplots illustrate the fraction of correctly reconstructed cells per cell type. Each point represents a single cell type (\(c = 24\)). Among the considered rare cell types (2-Mesothelium and Submucosal Secretory with 1 cell, Myofibroblasts with 2 cells, and Fibromyocytes with 7 cells), indicated within red circles, scDOT successfully mapped these rare cell types to their exact spatial locations (fraction = 1.0), while Novosparc failed to map 2-Mesothelium to its spatial location (fraction = 0.0). c The root-mean-square-error (RMSE) of the deconvolved cell-type proportions compared to the ground truth is evaluated for synthetic data 2, consisting of 9 cell types across 3072 spots. scDOT, along with other methods including DestVI, Tangram, and Novosparc, is compared in terms of RMSE. The boxplots demonstrate that scDOT outperforms the other methods, as indicated by the lower RMSE values. The boxplots display the median (middle line), 25th and 75th percentiles (box), and 5th and 95th percentiles (whiskers)

Comparison to other methods on spatial mapping and cell type deconvolution Spatial mapping

We evaluate the performance of scDOT in spatial mapping and compare it with other existing methods. Figure 2a presents the results for synthetic data 1, where the threshold is set above Q3 + 1.5\(\times\)IQR. scDOT achieves the highest outcome at this threshold, while the outcome of Novosparc is drastically decreased compared to the outcome at thresholds above the 90th and 95th quantiles. This observation suggests that our probabilistic mapping exhibits a heavier-tailed characteristic, which is a more desirable property for accurate spatial mapping.

Moreover, we find that the reconstruction results are influenced by the dataset used. For synthetic data 2, scDOT achieves a high outcome when the threshold is set above Q3 + 1.5\(\times\)IQR, with 76% of cells successfully reconstructed. However, stricter thresholds lead to a more rapid decay in the outcomes, with only 50% of cells being reconstructed at the threshold above the 95th quantile. Nevertheless, across all cases, scDOT consistently outperforms both Novosparc and the naive baseline of Random Sinkhorn. We also tested scDOT using another simulated dataset based on a new Xenium single-cell data from a breast cancer tumor microenvironment [25]. Again, we observed that scDOT performs well and is superior to Novosparc, achieving very high accuracy (Additional file 1: B.2, Fig. S1).

In terms of accurately mapping rare cell types to their spatial positions, scDOT successfully assigns all four rare cell types with a fraction of 1.0. However, Novosparc failed to accurately map 2-Mesothelium to its spatial location, as indicated by a fraction of 0.0. Also, as indicated in Fig. 2a, scDOT mapped 76% cells correctly while Novosparc mapped 56% cells correctly; these 20% differences is not shown in Fig. 2b since the difference in the number of cells per cell type is not considered. Finally, we conducted an analysis of scDOT’s performance on rare cell types by varying the fraction of B cells in the dataset. Results, presented in the Additional file 1: B.3, demonstrate the robustness of scDOT for mapping rare cell types.

Deconvolution

To benchmark the results of cell type deconvolution, we applied scDOT to synthetic data 2 and compared it with three other methods: DestVI [7], Tangram [26], and Novosparc [27]. The synthetic dataset comprised nine cell types distributed across 3072 spots. We specifically chose these three deconvolution methods as they represent distinct computational techniques tailored for spatial transcriptomics data. DestVI is a probabilistic-based method, Tangram utilizes deep learning, and Novosparc is an OT-based method. All three methods require spatial transcriptomics data as input and scRNA-seq data as a reference. Comparing the root-mean-square-error (RMSE) of the deconvolved cell type proportions with the ground truth, scDOT outperformed the other three methods (see Fig. 2c). The mean RMSE scores for scDOT, DestVI, Tangram, and Novosparc were 0.06, 0.15, 0.23, and 0.20, respectively. It is worth noting that Novosparc is not designed for direct computation of cell type deconvolution but rather for mapping cells to spots. As a result, the deconvolution results are calculated by multiplying the coupling matrix \(\gamma\) with the cell-by-cell type relation matrix C, i.e., \(P = \gamma \times C\). We also used Pearson correlation coefficient (PCC) analysis for this benchmark study. Result still demonstrate the superiority of scDOT over other methods, albeit with high variance (see Fig. S2).

Identfiying the spatial patterns of the distribution of specific cell types

We used paired IPF scRNA-Seq and spatial dataset to test the ability of our mapping method to infer cell-cell interactions (Fig. 3). Among the 29 cell types (Methods), multiciliated, secretory, and basal cells exhibited prominent and distinct spatial patterns. Notably, multiciliated, secretory, and basal cells were found to be in close proximity to each other, both in the upper lobe and lower lobe of the tissue. This observation aligns with the traditional view of the airway epithelial mucosal layer, which incorporates basal cells in close proximity to secretory and ciliated cells, forming a tight unit. This unit serves as a physical barrier while remaining responsive to the inhaled environment through interactions with submucosal fibroblasts, smooth muscle cells and cells and molecules from the immune system [28].

Secretory and multiciliated cells are known to be located in close proximity to each other within the respiratory tract, including the lungs. Together, they form a self-clearing mechanism that efficiently removes inhaled particles from the upper airways, preventing their transfer to deeper lung zones [29]. The coordinated action of multiciliated cells, with their motile cilia, and secretory cells, responsible for mucus production and secretion, enables the effective clearance of inhaled particles and maintains the integrity of the respiratory system [30].

Basal cells, positioned closer to the basement membrane, further contribute to the organization and functioning of the airway epithelium. They provide structural support and are responsible for the regeneration and repair of the airway epithelial layer [28].

The spatial organization of multiciliated, secretory, and basal cells in close proximity to each other emphasizes their interdependence and coordinated functioning in maintaining the respiratory barrier and facilitating efficient clearance mechanisms. This finding underscores the significance of the spatial arrangement and interactions of diverse cell types within the airway epithelium for the overall homeostasis and defense of the respiratory system.

Conversely, immune cell types such as Macrophages and T cells lineage, which were characterized by a larger number of cells, displayed a more scattered distribution throughout the tissue. Yet, the spatial distribution of these two cell types are complementary to some degree (Figs. 3 and 4, Fig. S3), reflecting the fact that they are both important components of the immune system and play complementary roles in defending against infections and maintaining immune homeostasis. On the other hand, cell types with a smaller cell count, such as smooth muscle (consisting of only 2 cells in total), exhibited a spatial arrangement in adjacent spots (Fig. S3).

These patterns were also observed in the unpaired data, particularly with regards to the multiciliated lineage and secretory cell types (Fig. 3), demonstrating the generality of our approach on unpaired datasets. The difference between paired and unpaired data is quantified in detail in Additional file 1 (B.4).

Cell-cell proximity analysis

To quantitatively illustrate the spatial distribution and proximity of multiciliated, secretory, and basal cells described in the previous section and shown in Fig. 3, we employed the neighborhood enrichment score. This score between two cell types represents the z-score derived from a permutation test that tallies the neighboring spots consisting of either cell type. Consistent with the spatial patterns depicted in Sect. 3.3 and Fig. 3, we observed the highest enrichment score between the multiciliated lineage and itself across various datasets (69.46 in the upper lobe of familial IPF paired data, 29.31 in the lower lobe of the same data, and 47.98 in the IPF unpaired data). The score between multiciliated and secretory cell types is also one of the highest (19.40 in the upper lobe of the paired dataset, 12.25 in the lower lobe, and 5.06 in the unpaired dataset). In contrast, the scores between macrophages and T cells are among the lowest across datasets, with scores of − 25, − 5.75, and − 15.83 in the upper lobe, lower lobe, and unpaired dataset, respectively. These scores reflect the fact that they are complementary, as indicated in Sect. 3.3 (see Fig. S3). It is important to note that the neighborhood enrichment scores were estimated at the spot-level and only considered the dominant cell type of each spot, which is defined as the cell type with the highest proportion within that particular spot.

At the cell level, we constructed a cell-cell spatial proximity graph based on OT placement (see the Methods section). The graph was then summarized by cell types, quantifying the physical proximity between each cell type by counting the direct neighboring cells within the same type (see Fig. S3d and e). Once again, the multiciliated lineage exhibited the highest normalized counts with itself across datasets, consistent with the results obtained from the enrichment score and described in Sect. 3.3. In the paired dataset, basal and secretory cells also demonstrated a strong association with the airway epithelium, providing additional evidence for the spatial organization of the respiratory system as discussed in Sect. 3.3. In contrast, immune cells such as T cells and macrophages displayed connections to various cell types, reflecting their dispersed distribution throughout the tissue. Notably, in the IPF lung sample, fibroblast cells exhibited a distinct spatial pattern and were found to be in close proximity to 2-smooth muscle cells and myofibroblast cells, supporting previous research suggesting that \(\alpha\) smooth muscle actin-expressing fibroblasts, referred to as myofibroblasts, serve as markers of progressive lung injury and play a central role in detrimental remodeling and disease progression [31, 32] (Fig. S3, Sect. 3.6).

Fig. 3figure 3

Spatial distribution patterns of multiciliated, secretory, basal, and macrophage cells across different datasets. Top: A UMAP representation of scRNA-seq data, along with the spatial patterns of the selected cell types in the upper lobe slice of the paired familial IPF lung. Middle: A UMAP representation of scRNA-seq data and the corresponding cell types in the lower lobe slice of the same sample. Bottom: A UMAP representation and spatial distribution of selected cell types in the unpaired IPF lung sample. Notably, multiciliated, secretory, and basal cell types exhibit distinct and prominent spatial patterns. Importantly, these cell types consistently exhibit close proximity to each other across all three datasets, consistent with previous studies on the organization of the respiratory system [28,29,30]

Identification of senescent markers

For cellular senescence analysis, we profiled two new spatial datasets. The first included paired scRNA-Seq data from a familial IPF lung sample, and the other consists of unpaired data from an IPF lung sample (Methods).

Paired data of familial IPF lung sample

We first identified in the scRNA-seq data, cell types with a large fraction of cells exhibiting senescent. For this, we used a list of 68 senescent marker genes (Methods). Within each cell type, we separated the cells into senescent and non-senescent cells (Fig. 4a, b). For this familial IPF lung sample, the ratio of senescent cells to non-senescent cells is low. For most cell types, we observed very few senescent cells. For others, we found more. For example, for mast cells, T cell lineage, and airway epithelium, we identified 14%, 13%, and 17%, respectively. We thus focused on these three cell types. for these we had 24, 193, and 3 senescent cells for mast cells, T cell lineage, and airway epithelium, respectively. Next, we manually annotated the regions where senescent cells from different cell types are collocated (Fig. 4b, c). For these regions, we computed differentially expressed genes (DEG) w.r.t. the rest of the tissue. As expected, given the way we selected these regions, we found among the top ranked DEG IGFBP4 and IGFBP7 (t-test p-values are 1.1e−11 and 7.2e−07 respectively), which are both senescent marker genes (Fig. 4d). We next performed gene set enrichment analysis (GSEA) with this ranked gene list and a gene set of 340 senescent markers (which is a superset of the 68 senescent marker genes set we used for re-annotation, Additional file 2), we confirmed that cellular senescence is enriched−with p-value \(=0.006002\); FDR \(=0.006002\), and the normalized enrichment score is 1.726−in the annotated region (Fig. 4d). The leading-edge subset of genes in this analysis comprised IGFBP4, IGFBP7, FGF7, THBS1, IGF1, IGFBP6, IL6, SERPINE2, PIM1, ALDH1A3, SERPINE1, COL1A2, ANGPTL4, CYP1B1, and PLAU. While IGFBP4 and IGFBP7 belong to the initial set of 68 senescent marker genes, the remaining genes are part of the larger set of 340 senescent marker genes. Of particular note, IGFBP4 and IGFBP7 are SASP factors that have been identified as key components needed for triggering senescence in young mesenchymal stem cells (MSC) [33]. The pro-senescent effects of IGFBP4 and IGFBP7 are reversed by single or simultaneous immunodepletion of either proteins from the conditioned medium (CM) from senescent cells [33]. According to a previous study, prolonged IGF1 treatment leads to the establishment of a premature senescence phenotype characterized by a unique senescence network signature [34]. Combined IGF1/TXNIP-induced premature senescence can be associated with a typical secretory inflammatory phenotype that is mediated by STAT3/IL-1A signaling [34].

Fig. 4figure 4

Analysis of cellular senescence reveals the spatial collocation of senescent cells. a The number of senescent cells and non-senescent cells for each cell type is depicted. T cell lineage, mast cells, and airway epithelium exhibit the highest fraction of senescent cells. b Spatial distribution of senescent and non-senescent cells for the three aforementioned cell types. Notably, the three different senescent cell types are spatially collocated in the upper left corner of the tissue. c Differentially expressed genes for the manually annotated senescent region (colored in orange) in the upper left corner of the tissue (as depicted in b of this figure and the upper right corner of this panel). Among the top-ranked DEGs are IGFBP4 and IGFBP7, which are also senescent marker genes. d Gene set enrichment analysis (GSEA) plot. The top-ranked DEGs (as shown in panel (c) of this figure) are enriched in the gene set consisting of 340 senescent marker genes

Inferring cell-cell interactions driving senescence

We also looked at the cell type neighborhood of senescent cells. These are summarized in Fig. 5a. We observe that senescent cells are often close to non-senescent cells of the same type (e.g., senescent T cells to non-senescent T cells) which can explain why some cell types have a much higher percentage of senescent cells than others.

Utilizing the CellPhoneDB [35], we further identified the ligand-receptor (LR) pairs involved in the cell-cell interactions within the neighborhood of senescent cells (i.e., within the graph \(G'\)) (Fig. 5d). We observed that 11 senescent markers, namely B2M, CALR, CCL5, CD44, HMGB1, IGF1R, MIF, TNF, VIM, MMP9, and TNFRSF1B, were significantly overrepresented in the list of ligands and receptors identified by CellPhoneDB (hypergeometric test p-value = 0.00072). Among the LR pairs involved in senescent-to-senescent cell-cell communication (i.e., between senescent T cells), most of the pairs include senescent marker genes. The other remaining LR pairs involve the HLA gene family (which is essential for T cell activation). For example, HLA-E acts as an inhibitory signal for NK and CD8 T cells-and depletion of HLA-E renders senescent cells susceptible to elimination by both NK and CD8 T cells [36]. Another LR pair involves S100A8, which increases with age, inducing inflammation and cellular senescence-like phenotypes in oviduct epithelial cells [37, 38].

Fig. 5figure 5

Analysis of senescent cell-cell communication in the upper lobe of the familial IPF lung sample. a The graph summarizes the spatial neighborhood of senescent mast cells and T cell lineage. Nodes represent cell types, and edges indicate direct neighboring relations in physical proximity. The size of each node corresponds to the number of cells within a cell type, while the width of the edges represent the number of neighboring cells of a specific cell type (i.e., the total node degree per neighboring cell type). Edges representing a small number of neighbors are omitted. As can be seen, senescent cells are close to both non-senescent cells within the same cell types and senescent cells belonging to different cell types. b Cell-cell spatial neighborhood of senescent cells for the T cell lineage. The validity of this neighborhood graph is assessed in Additional file 1: B (Supplementary Analysis). c The subgraph of the cell-cell neighborhood depicted in panel (b), specifically showing the cells located in the senescent region (colored orange). d The results from CellphoneDB display the co-expressed ligand-receptor pairs between senescent cells of the T cell lineage and all other cells within the subgraph illustrated in panel (c)

Unpaired data from IPF lung sample

To demonstrate the general utility of the method for unpaired data, we performed the same analysis as described for the paired data mentioned above for another spatial dataset we profiled, this time without matched scRNA-Seq (Methods). Using a scRNA-seq dataset of an IPF lung sample, we were still able to identify several of the same senescence cell types as in the paired dataset, including T cells and mast cells. There were 300 assigned senescent cells out of the total 3747 T cells and 11 assigned senescent mast cells out of the total 249 mast cells. We also observed high fraction of senescence cells for other cell types including for fibroblasts (290 out of the total 461 fibroblast cells) and 2-smooth muscle (8 out of 21).

We again observed that senescence cells co-localized in the same regions (Fig. 6a). While T cells tended to be distributed throughout the tissue, there is a high fraction of senescent cells co-localized with fibroblasts and mast cells (Fig. 6a). Fibroblasts and 2-smooth muscle cells co-localized in specific regions, with a total of four overlapping regions as depicted in Fig. 6a. Since senescent cells tend to co-localize with other cells of the same type, most senescent fibroblast cells and 2-smooth muscle cells also co-localized (except for the region in the upper left corner of the tissue, which exhibited only senescent fibroblast cells). These observations of senescent spatial distribution align with previous studies suggesting that senescent cells have the potential to influence neighboring cells through processes collectively referred to as the senescence-associated secretory phenotype [39].

Figure 6b, c illustrate the physical proximity among cells of different cell types. Similar to the paired data of the familial IPF lung sample, senescent cells are closely clustered together and near cells of the same type. As shown in Fig. 6b, the senescent T cells are adjacent to other T cells, mast cells, and macrophages. The cell-to-cell spatial neighborhood graph, with nodes representing senescent T cells and their immediate neighbors, is depicted in Fig. 6c. The validity of this neighborhood graph is assessed in Additional file 1: B (Supplementary Analysis). For a more specific focus on senescent fibroblasts, a cell-to-cell neighborhood graph can be found in Fig. S4.

留言 (0)

沒有登入
gif