STASCAN deciphers fine-resolution cell distribution maps in spatial transcriptomics by deep learning

Overview of STASCAN

STASCAN adopts a deep learning model to utilize both the spatial gene expression profiles from the ST technology and the corresponding histological images. Leveraging these multi-modal data, STASCAN depicts a fine-resolution cell distribution map in tissues by generating annotations of cell types for the spots or subdivided spots among captured and uncharted areas (Fig. 1a). Firstly, STASCAN extracts the spot images from slide images based on location information and infers highly reliable cell labels for each spot based on spatial gene expression using deconvolution during the pre-labeling process. Secondly, STASCAN constructs a base convolutional neural network (CNN) model (VGG16 architecture) [34] and trains the base CNN model using the cell-type labeled spot images as input. Additionally, STASCAN provides optional section-specific training, which can fine-tune the base CNN model through transfer learning to improve the prediction accuracy for a specific single section. Finally, through ample training, STASCAN can accurately predict cell types solely based on histological images (Fig. 1a, Additional file 1: Fig. S1 and “ Methods”).

Fig. 1

STASCAN is further designed into three application modules: (1) cell annotation for the embedded unseen spots in uncharted areas, which is based on learning image features from the measured raw spots, assigning predicted cell type for each unseen spot, and merging the unseen and raw spots to achieve a super-resolved cell distribution (Fig. 1b); (2) cell annotation for subdivided spots, utilizing features learned from subdivided-spot images with optional pseudo-labels to obtain a sub-resolution cell distribution (Fig. 1c); and (3) cell annotation for unseen sections, which learns the spot images from measured ST sections with optional pseudo-labels to predict the cell distribution on adjacent uncharted section images from the consecutive sections for constructing 3D cell models (Fig. 1d).

STASCAN enables more precise cell annotation and cell type prediction solely from images

To quantitatively evaluate the performance of STASCAN, we initially applied it to a comprehensive planarian (Schmidtea mediterranea) dataset generated by 10 × Visium technology, which includes ten sequenced ST sections (containing both spatial gene expression data and histological images) and nine unsequenced sections adjacent to ST sections (only containing histological images) [35]. Given the comprehensiveness of the planarian dataset, we first constructed a base model using 1829 spot images extracted from 10 collected sections to learn the features of 7 main cell types identified by sequencing information, including epidermal, gut, muscle, neoblast, neuronal, parenchymal, and secretory cells (Additional file 1: Fig. S2a, b, Additional file 2: Table S1 and “ Methods”). Although there was uncertainty in predicting neoblast and parenchymal cell types due to the scarcity of training samples of these two, most cell types were annotated with a recall rate of over 78% (Fig. 2a and Additional file 1: Fig. S2c). In addition, the learned model showed excellent accuracy in predicting cell types, with the area under the curve (AUC) calculated from the receiver operating characteristic (ROC) curves reaching as high as 0.936 to 0.996 (Fig. 2b). Besides, considering the potential batch effect among different ST sections, we performed section-specific training based on the base model (“ Methods”). The results from the section-specific model showed a significant improvement in accuracy with higher AUC values compared with the base model, indicating that section-specific training is beneficial for the improved prediction performance of the whole frame (Fig. 2c).

Fig. 2

Evaluation of STASCAN in the 10 × Visium planarian dataset

We further compared the performance of STASCAN in predicting dominant cell types on raw spots with other methods, such as Cell2location [17], Seurat [18], and RCTD [19], using the planarian dataset. We initially annotated the cell types of each raw spot manually according to the morphologic features of corresponding spot images, which are considered the ground truth. We calculated the Kullback–Leibler divergences between the predicted cell distribution by different methods and the ground truth to evaluate the performance. STASCAN is highly consistent with manual annotations and significantly superior to other methods (Fig. 2d, e and “ Methods”). We also observed that methods instead of STASCAN result in varied biases for cell annotation. For example, Cell2location could characterize most cell distribution but with a low sensitivity for epidermal cells; Seurat showed strong annotation bias for epidermal cells, leading to mislabeling for other cell types; and RCTD displayed some positive annotation but lost annotation information for most neuronal and secretory cells.

We also compared STASCAN with other methods utilizing both the morphological and transcriptional features for ST data analysis. Tangram [21] effectively illustrated the distribution of the majority of cells but exhibited a slight bias towards neuronal cells and a reduced sensitivity in detecting epidermal cells (Fig. 2d and e). On the other hand, MUSE [28] characterized tissue regions by identifying spot clusters, yet these clusters appeared relatively scattered and failed to represent corresponding structures in the planarian dataset (Fig. 2d). In contrast, our STASCAN displayed more precise performance in prediction, with the capability of accurately pinpointing the spatial distribution of seven main types of cells, in accord with their known biological functions [36] (Fig. 2d, and Additional file 1: Fig. S2d,e). For example, corresponding to clear tissue structures visible through hematoxylin and eosin (H&E) staining, the epidermal cells draw the contour of the planarian body, gut cells mark the location of the intestine, and muscle cells along with neuronal cells define the anatomy of the pharynx (Fig. 2d and Additional file 1: Fig. S2d).

Another significant advance of STASCAN compared to the existing methods is that STASCAN enables accurate cell-type prediction solely based on corresponding spot images. We compared the performance of STASCAN, Tangram, and MUSE in predicting cell types when morphological images are provided while gene expression information is masked (Fig. 2f). STASCAN achieved precise cell annotation predictions which were consistent with those made when both image and gene expression data were available. However, Tangram failed to predict cell types without gene expression data. Although MUSE achieved the characterization of cell clusters solely based on images, it was also disturbed by the absence of gene expression data, leading to incorrect predictions. For instance, MUSE identified two distinct cell clusters in the gut region that were disharmony with the manual annotations and also failed to identify the pattern of neuronal cells in the pharynx region (Fig. 2f). This comparison highlights the superiority of STASCAN and provides the basis for its utilization in three designed application modules in the subsequent steps.

STASCAN achieves super-resolution cellular patterns and improves 3D reconstruction in planarian

Next, we assessed the capabilities of STASCAN in different application modules using the planarian dataset [35]. When using Seurat and Cell2location to predict the spot cell types for pre-labeling, approximately half of the raw spots cannot be assigned with reliable cell labels (Additional file 1: Fig. S2a and “ Methods”). This issue may be due to the noises generated by the complex signatures of gene expression in each spot, indicating the drawback of deconvolution in determining cell types (Fig. 3a). Choosing the other half of raw spots with credible labels as the prior spots to train the model, STASCAN achieved the reliable ability of cell type annotation based on images and depicted super-resolved cell distribution map (Fig. 3a). Firstly, STASCAN performed cell annotation for the unseen spots and demonstrated an enhanced resolution of cell distribution through combining both unseen spots and raw spots. The enhanced cell distribution map was highly consistent with H&E staining images, highlighting the relevant structures that were not shown at the raw resolution, such as the ventral nerve cord, genital chamber, pharynx, and contour (Fig. 3a and Additional file 1: Fig. S2d). Besides, it was highly consistent with the distribution of corresponding cell markers reported in previous literature [35, 36] (Fig. 3b and Additional file 1: Fig. S3a).

Fig. 3

STASCAN provides comprehensive and multidimensional cell annotation in the 10 × Visium planarian dataset

Furthermore, STASCAN pinpointed the composition of cell mixtures and their distinct locations at sub-resolution, effectively distinguishing cell types of each subdivided spot, and displaying a more detailed distribution of fine-grained cells (Fig. 3c). For example, STASCAN sensitively allocated secretory and neoblast cells around the contour into sub-divided positions according to the morphological differences. STASCAN also identified muscle cells located at the junction of the pharynx and intestine at sub-resolution and were consistent with the biological priori information (Fig. 3c), which were not discovered at raw resolution from a group of gut cells. In addition, we utilized STASCAN to predict the enhanced sub-resolved distribution of gut cells, obtaining the fine-grained distribution of gut cells and reproducing the classical branching structure of the planarian intestinal tract (Fig. 3d). Moreover, we further compared the cell distributions generated by cell deconvolution methods on raw spots versus STASCAN on subdivided spots and found that STASCAN precisely assigns the fine-grained subdivided spots to their physically spatial locations with corresponding cell types, while the deconvolution method on the raw spots only resolved the composition of different cell types without determining the exact locations of the mixed cells (Additional file 1: Fig. S3b, c). Collectively, these results indicate that STASCAN significantly enhances cell granularity at sub-resolution, facilitating the depiction of instrumental sub-structure with fine-grained cells.

Last but not least, STASCAN achieved the prediction of cell distribution in the unseen sections only by H&E images using the learn features of the adjacent ST sections (Fig. 3e and Additional file 1: Fig. S4a-e). In line with the biological interpretation that the cell distribution between two consecutive sections should be similar, the structure similarity index measure [37] (SSIM) (ranging from 0.67 to 0.89) was linearly correlated with the spacing distance between adjacent images and ST sections, demonstrating the prediction accuracy of cell annotation for unseen sections (Fig. 3f, Additional file 1: Fig. S4a and “ Methods”). Besides, we selected two adjacent ST sections that serve as the testing data (section-21) and ground truth (section-23). We trained two STASCAN models with section-21 and section-23, respectively, and employed these models to predict the cell distribution in section-23 solely based on the H&E staining image. Considering the manual cellular annotation in section-23 as ground truth, we observed that the model trained from section-21 enabled the prediction of cell distribution in section-23 and had a high correlation with both ground truth and the prediction generated by the model directly trained by section-23. These results firmly confirmed the reliability of STASCAN on cell annotation for unseen sections (Additional file 1: Fig. S4d, e). Finally, we generated raw and unseen spots from ST sections and adjacent images, applied STASCAN to predict cell types for those spots, and then reconstructed 3D models for different structures with cellular patterns (Fig. 3g and Additional file 1: Fig. S4b, c). The model displayed the cell distribution in three-dimensional, with improved cellular resolution in spatial and promoted utilization of staining images without ST sequencing.

STASCAN identifies well-defined boundaries of distinct cell layers in the human intestinal tissue

To further evaluate the performance of STASCAN for ST datasets on different tissue architectures, we applied STASCAN to human intestinal datasets generated by 10 × Visium technology. These datasets consisted of eight slides sampled at diverse sampling locations and time points [38]. We trained each slide using STASCAN with different sizes of prior spots, ranging from 297 to 1551, and observed stable performances across all sizes (Additional file 1: Fig. S5a-c, Additional file 2: Table S1 and “ Methods”).

Next, we applied STASCAN to predict cell types of unseen spots. Compared with the cell distribution of prior spots annotated by other methods, STASCAN stratified cell populations to finer regional layers. For example, in comparison with other methods only roughly distinguishing the distribution of different cells, STASCAN was able to delineate the borders of cell layers among the intestinal epithelium, fibroblasts, and muscularis, greatly enhancing the cellular spatial patterns (Fig. 4a and Additional file 1: Fig. S6).

Fig. 4

STASCAN depicts spatial layers of distinct cell types in the 10 × Visium human intestinal dataset

Then, we used STASCAN to draw the spatial distribution map of fine cell subtypes in the human intestinal tissue (Fig. 4b). Actually, we labeled three anatomical layers of the intestinal tissue related to the morphological structures of H&E staining, including muscularis, fibroblasts, and epithelium layers, listed based on their distance to the intestinal edge (Fig. 4c). When evaluating four epithelium subtypes occupying an absolute proportion of the epithelium layer, we found that compared to the results of alternative method at raw resolution, STASCAN not only highlights the precise distribution of these subtypes but also accurately locates the positions of distinct subtype cells (Fig. 4b, d, e and Additional file 1: Fig. S7a). For instance, distal epithelium subtype cells tend to gather closer to the boundary of the epithelium layer and the fibroblast layer, and proximal epithelium subtype cells are prone to assemble to the surface of the epithelium layer at sub-resolution. Besides, distal stem cells were correctly predicted to be located in the epithelium layer at sub-resolution; however, at raw resolution, a part of distal stem cells was abnormally predicted to be located in the fibroblasts layer (Fig. 4d, e and Additional file 1: Fig. S7b).

In addition, we performed STASCAN on a pair of identical and adjacent sections of the intestinal tissue to valid the cell annotation for unseen sections (“ Methods”), and the high correlation between them further confirmed the accuracy and reliability of the prediction (Additional file 1: Fig. S8a, b).

STASCAN uncovers a novel structure in the human lung tissue

Despite the limitations of raw spatial resolution of ST technologies, STASCAN assists in enhancing cellular patterns and rediscovering the micrometer-scale structure. Here, we applied STASCAN on the 10 × Visium human lung dataset [39], which sampled from the proximal airway. We previously redefined 13 reference cell types to better illustrate the organizational structure and annotated 822 ST spots with seven dominant cell types to train the STASCAN (Additional file 1: Fig. S9a-e, Additional file 2: Table S1 and “ Methods”). With enhanced resolution, STASCAN showed more precise cellular and structural patterns of human lung tissue. Besides, we observed that STASCAN sensitively identified a micrometer-scale oval-shaped structure that was highly consistent with the H&E staining images, which was confirmed as the smooth muscle bundles adjacent to the tracheal wall. However, this structure was not evident at the raw resolution of prior spots, highlighting the capability of STASCAN to reveal refined structures of spatial regions (Fig. 5a and Additional file 1: Fig. S9c).

Fig. 5

STASCAN demonstrates the special structure from the 10 × Visium human lung data

Furthermore, we compared the ability of STASCAN with other methods in revealing tissue structures with cellular patterns. The results showed that STASCAN could depict the silhouette of the airway with basal and neuroendocrine cells and the cricoid cartilage structure surrounded by goblet cells, mucous cells, smooth muscle cells, pericytes, etc. Moreover, the smooth muscle tissue traced by smooth muscle cells and pericytes in the left bottom of the slide was only identified by STASCAN (Fig. 5b and Additional file 1: Fig. S9c). Briefly, compared with the cell distribution pattern identified by other methods, STASCAN displayed superior advantages in identifying and characterizing spatial specific structures, which better reflects the anatomical structure after imputation.

STASCAN depicts the pathological spatial structural variations of human cardiac tissue after myocardial infarction

To explore whether further improved functional applications in ST data analysis can be achieved by STASCAN, we adopted this approach to reanalyze the 10 × Visium human cardiac datasets [40], which included 17 slides from normal hearts and the pathological ones after myocardial infarction (Additional file 1: Fig. S10a-b, S11, Additional file 2: Table S1 and “ Methods”). We first grouped these slides according to their sampling regions [40], including normal non-transplanted donor hearts as controls, necrotic (ischemic zone and border zone), and unaffected regions (remote zone), and regions at later stages after myocardial infarction (fibrotic zone).

Based on the ability of STASCAN in generating cell maps solely from histology images, we considered STASCAN not only a valuable algorithm to enhance the spatial cell distribution but also a constructive tool to imputing the cellular pattern of missing regions that failed to capture transcripts normally during ST sequencing. To evaluate the performance of STASCAN in imputing missing cellular distribution, we first selected half of the ST spots in slide_ACH003 as missing spots and then trained STASCAN with the other half spots. After that, we performed STASCAN to predict the whole cell distribution of the slide_ACH003. Considering cell annotations among prior spots as ground truth, we observed that STASCAN could well reproduce the cell distribution among the missing spots, especially replicated the structure of vasculature surrounded by fibroblast and vSMC cells, which solidly indicated the reliability of STASCAN in imputing cellular patterns in the missing regions (Additional file 1: Fig. S12a, b).

After the evaluation, we focused on two slides with a lot of missing spots that were filtered out due to scanty genes and unique molecular identifiers (UMI) measured in the original literature [40] (Fig. 6a and Additional file 1: Fig. S13-S17a). In these two slides, STASCAN not only more accurately depicted the cell distribution pattern of tissue structures but also predicted the potential cell distributions in the missing areas only from images. Especially for the serious missing in the slide_ACH0010 sampled from the ischemic zone, STASCAN better imputed the reasonable diffusion of the cardiomyocyte, fibroblast, and myeloid cells in line with the histological morphology (Fig. 6a). The proximity of the latter two cells indicated a strong dependence between them on the areas of immune cell infiltration and scar formation [40].

Fig. 6

STASCAN reveals cell-type niches in the 10 × Visium human cardiac data

We then explore the spatial structural variations in these two slides by performing unsupervised clustering for spots based on the composition of cell annotations predicted by STASCAN and then mapped the clusters, defined as cell-type niches, to the spatial regions (Fig. 6b, c, f and Additional file 1: Fig. S13-S17b). Through redrawing the spatial distributions for these cell-type niches, the cardiac tissue manifested more delicate spatial patterns compared with the dominant cell annotation, which were consistent with histological morphology and detailed structural variations observed during physiological and pathological processes.

Besides, these cell-type niches based on diverse cell propositions revealed more elaborate cell-interacting microenvironments with potential biological insights (Fig. 6d, e and Additional file 1: Fig. S13-S17c). For example, we observed myogenic cell-type niches (0, 1, and 2) mainly displaying characteristics of cardiomyocyte cells and fibrotic cell-type niches (3, 4, 5, 6, and 7) mainly presenting characteristics of fibroblast cells, in the slide_ACH006 sampled from the fibrotic zones. On the aspect of spatial distributions, the myogenic cell-type niches could jointly characterize the myocardial structure and the fibrotic cell-type niches distinguished by the proportion of fibroblast cells indicated different fibrosis processes during the lesion. Especially, there was a measure of characteristics of vSMCs and endothelial cells in niches 3 and 4 for depicting the spatial structure of the cardiac vasculature. In addition, in slide_ACH0010, we observed inflammatory cell-type niches (4, 5, 6, and 7) which mainly exhibit characteristics of myeloid and mast cells, apart from myogenic cell-type niches (0 and 1) and fibrotic cell-type niches (2 and 3). These three types of niches took up distinct spatial regions, but there was niche 7 located at the intersection, in line with the proposition of cardiomyocyte, fibroblast, myeloid, and mast cells in niche 7. Especially, niches 2, 3, 4, and 5 showed co-enrichment between myeloid and fibroblast cells, in accordance with the role of macrophages in fibroblast activation [41] and fibroblast cells in macrophage attraction [42]. Overall, STASCAN expands the application of niche distribution and provides better insights into understanding cellular microenvironment interactions.

STASCAN deciphers the intricate tissue organization throughout the developmental stages of the mouse brain

We further test whether STASCAN is also applicable to ST data derived from various technologies. We first employed STASCAN on an embryonic mouse brain dataset from MISAR-seq [43], a microfluidic indexing-based spatial technology motivated by DBiT-seq [4] with both high-quality image and sequencing data (Additional file 1: Fig. S18a-c, Additional file 2: Table S1 and “ Methods”). Notably, although the H&E images adopted in this dataset were obtained from the adjacent tissue slide which causes a partial disharmony between the actual gene expression pattern and morphological images, STASCAN still achieved excellent results. When compared with the cellular distribution annotated by RCTD [43], STASCAN significantly improved the cellular resolution with highlighted characteristics of tissue structures (Fig. 7a and Additional file 1: Fig. S18a-c). For example, the enhanced distribution pattern of forebrain GABAergic neurons was associated with the subpallium, and a group of the forebrain glutamatergic and cortical or hippocampal glutamatergic neurons spotlighted the dorsal pallium of the forebrain at enhanced resolution (Fig. 7a).

Fig. 7

STASCAN revealed major anatomical tissue regions in the mouse brain dataset generated from microfluidic technologies

Furthermore, we respectively generated cell-type niches for different development stages of mouse brain tissue and mapped them to spatial regions (Fig. 7b and “ Methods”). Using the manual anatomical annotations of major tissue organizations from H&E images as the ground truth [43] (Fig. 7b), we compared the cluster distribution of cell-type niches to the raw resolution distributions of cell annotations on prior spots. The cluster distribution of cell-type niches generated by STASCAN more remarkably recapitulates the tissue organization in the developing mouse brain than the ones from raw resolution. Especially for the E18.5 embryonic mouse brain tissue, in contrast to cell annotation at raw resolution which showed an unrecognized organization in nearly entire brain region, STASCAN clearly defined the major tissue domains by cell-type niches clustering (Fig. 7b). Collectively, these results illustrate the strength of STASCAN in highlighting tissue structures and redrawing finer organizations using ST data from various technologies.

View original article

GENOME BIOLOGY

分享书签

0 0 0 0 0 0 0

More from this channel

STASCAN deciphers fine-resolution cell distribution maps in spatial transcriptomics by deep learning

留言 (0)