StaVia: spatially and temporally aware cartography with higher-order random walks for cell atlases

StaVia enables high-definition cartographic TI reconstruction across the entire single-cell atlas

StaVia is a graph-based TI framework designed to tackle challenges posed by atlas-scale data. It builds on our earlier VIA method [13] which models cellular processes as a random walk with elements of laziness and teleportation across cluster graphs [19]. In StaVia, we introduce higher-order walks with the memory of cells’ previous states, integrated with cartographic views and enriched with information from available metadata (temporal or spatial), to reconstruct atlas-scale topologies coupled with automated predictions of diverse cell fates and their sequential specialization.

Advancing from VIA and other TI methods, StaVia’s contributions are threefold. First, StaVia uses high-order LTRWs with memory to infer complex trajectories by relaying information about a cell’s previous states (Fig. 1b). This approach accurately pinpoints end-to-end differentiation paths and gene dynamics associated with a particular lineage. Forgoing the walk’s memory can obscure the distinction between the different pathways to cell fates in large atlases. Second, it allows flexible integration of data and metadata (e.g., time-series developmental labels from temporal atlases, spatial layout, gene/feature similarity, and single-cell RNA-velocity) to compute pseudotimes, cell fates, and lineage pathways (“Methods”) (Fig. 1a). Integrating available temporal data with the expression profiles allows us to stitch developmental points in a data-driven manner. Spatial information is particularly challenging to incorporate when examining cellular landscapes based on their gene expression due to their highly non-linear nature. However, the microenvironments that cells occupy could provide valuable insight about their function. StaVia therefore provides a framework within which gene-expression and spatial information are jointly considered when charting the cellular landscape. Lastly, going beyond the common cluster graph visualization [13, 20], StaVia generates an Atlas View that simultaneously illustrates complex chronological patterns and distinct phenotypic diversity, which has been challenging in current TI methods. Both the spatial arrangement of nodes and edges in StaVia’s high-resolution Atlas View (and its cluster graph), as well as their direction, connectivity, and weights, are guided synergistically by the results of the pseudotime, sequential metadata, and second-order LTRWs (“Methods”) (Fig. 1a).

Using higher-order walks with memory is an unexplored feature in existing TI methods which typically rely on first-order random walks where the prediction of future steps is independent of previous states (e.g., Palantir [10], MARGARET [11], CellRank [12], and Via 1.0 [13]. When applied to reconstructing biological pathways, memoryless methods tend to encounter two types of problems which we explain by way of analogy to a faulty navigation system on a road trip from City A to B. The first issue occurs when memoryless methods suggest a cell passes through an unrelated intermediate population during development, akin to a GPS diverting us through an off-route City C. StaVia, uses higher-order LTRW with memory to act like an improved GPS that sense-checks directions, minimizing unnecessary detours, and ensuring a more accurate cell trajectory. Now imagine the road trip involves a critical turn at Point D but the GPS is so focused on the immediate road that it misses this turn. Analogously, some TI methods, due to an overemphasis on localized pathways, may fail to identify key transition states in a cell’s developmental journey. StaVia’s “memory” feature is like an alert GPS that not only focuses on the road immediately ahead but also keeps track of the overall journey. It remembers where each cell has been and where it could be headed, making it less likely to miss critical turns (or transition states), and providing a more complete picture of the cell’s developmental journey. By integrating pseudotemporal forward biasing, RNA velocity, and prior random walk state information (memory), StaVia’s higher-order LTRWs provide a more realistic prediction model of cell developmental pathways, enabling clear delineation of diverse lineages, transitional populations, and gene expression dynamics. (Fig. 1b).

Our robustness analysis shows that adjusting the memory level has a predictable and gradual impact on lineage definitions, simplifying the optimization (see “Methods” and Additional file 1: Fig. S5). Generally, increasing emphasis on memory in the LTRWs yields pathways that emphasize the role of predecessors and remain inwardly focused. This translates to increased sensitivity to distinguishing related cell types and their gene expression dynamics (Fig. 1b). Conversely, reducing memory helps explore poorly connected cell populations or those lacking precursors. The computational overhead from second-order LTRWs is minimal as they are conducted on the cluster graph level.

To generate StaVia’s cartographic Atlas View, we first create a single-cell embedding infused with second-order LTRW features learned from the TI cluster graph (see “Methods”). Specifically, based on the presence of sequential data (e.g. data labeled with different time points), the single-cell graph can be sequentially augmented and refined accordingly. In the case of spatial data, spatial nearest neighbors are used to augment the gene expression graph. Furthermore, prior to clustering and graph construction, the gene expression is modified as the weighted average of a cell’s own cells and its spatial neighbors when spatial data is being considered. We then use UMAP’s fuzzy simplicial set approach to align the LTRW-based cluster graph with a low-dimensional embedding (Fig. 1a). This single-cell embedding, a useful visualization in its own right, serves as the node layout for the Atlas View which arranges cell states and highlights edge connectivities (pathways) from the (spatial-temporally) augmented single-cell graph using an edge bundling method based on kernel density estimation (“Methods”). Directionality is projected on edges based on milestone pseudotime direction and RNA velocity. Note that the impact of high-order LTRWs with memory on the predicted end-to-end pathways can also readily be visualized in the Atlas View (Fig. 1b). For a detailed guide to parameter selection, see Additional file 1: Note S1.

StaVia captures a complete view of murine gastrulation

We employed StaVia on a scRNA-seq dataset of murine gastrulation [6], comprising 89,297 cells from stages E6.5 to E8.5 post-fertilization at quarter-day intervals. Previous trajectory analysis on this dataset required subsetting various lineages and analyzing them individually with manual curation in order to identify developmental trajectories of interest. In contrast, StaVia, by integrating higher-order LTRW with memory, RNA velocity, and time-series annotations (i.e., E6.5 to E8.5), accomplishes a holistic mapping of the entire atlas in a single run, accurately capturing relevant trajectories sans manual subsetting and curation (Figs. 2 and 3).

Fig. 2figure 2

StaVia Atlas View of mouse gastrulation. a StaVia Atlas View of murine gastrulation, colored by known stage with edge directions inferred using a combination of RNA velocity and pseudotime. Root state automatically detected as epiblast E6.5. Autodetected terminal cell fates are underlined. b Sequential order of hemogenic endothelial cell differentiation. The black arrow is based on the edge direction of the hematopoietic branch in a and shows that Runx1 precedes the upregulation of GFi1b, which is a direct target of Runx1 and critically down-regulates endothelial markers to induce the endothelial-to-hematopoietic transition (EHT) [21, 22]. c NMP cells colored red reside between neural-yellow and paraxial mesoderm-blue (lhs) the zoomed-in triangle of NMP cells express T brachyury. Of interest, the NMPs with a mesodermal tendency express Tbx6. NMPs with a more neural tendency express more Nkx1-2 [23]. d Dual source of gut formation with Ttr-positive cells at the visceral endoderm (VE), Sox17 expression for definitive endoderm (DE), and gut expressing Wnt5b [24, 25]

Fig. 3figure 3

Comparison of TI graph structure and analysis. a StaVia cluster graph shows a directed trajectory using a combination of scRNA-velocity and pseudotime. Colored by known stage within the mesoderm, StaVia identifies cardiomyocytes, paraxial mesoderm, and mesenchymal cells; within the neuro-ectodermal branch: the surface ectoderm, brain, and neural crest (NC); and arising from the visceral and definitive endoderm, the gut. b scVelo directed PAGA with a similar number of clusters and also using force-directed layout—lower visualized edges results in several disconnected clusters. c Automatically predicted differentiation flow based on the cluster graph. d StaVia captures Sox9 upregulation preceding Sox10 in Neural Crest (NC) development. e StaVia end-to-end pathways from epiblast to cell fate for each germ layer. Each trend line corresponds to a lineage. The lineage of interest is highlighted by the color of the lineage-plot’s border and associated marker-gene, e.g., in StaVia the brain lineage is dark pink. When the color of the marker gene matches the color of the upregulated trend line, it signals that the correct trend is inferred and merits a checkmark. f CellRank: the lineage pathways to the brain (light blue) exhibits is an example of where the pathway fails to detect transition states due to over localization and the corresponding blue lineage trend is not upregulated, warranting a cross-mark. The gut pathway (brown trendline) is an example of deviation into unrelated intermediate states resulting in distinct pathways becoming blurred

The Atlas View (Fig. 2a) and the cluster graph (Fig. 3a) visualizations created by StaVia illustrate the emergence of lineages within the entire dataset through a fanned-out structure that reflects the increasing separation between progressively specialized cells. The cluster graph (Fig. 3a), which forms the basis for the memory-infused second-order LTRW lineage probabilities as well as the layout and directionality of the Atlas View, captures the emergence of major lineages, their progressive separations towards cell fates (highlighted on the Atlas View Fig. 2a as underlined populations), and the correct placement of more subtle transition populations that exist at the boundaries of these major layers (e.g., neural mesodermal progenitors (NMPs) [23]). Interestingly, both the cluster graph and Atlas View are uniquely able to show that the gut arises from the visceral and definitive endoderm [24, 25]. They also visually indicate two hematopoietic sources, the first being erythroids from the primitive wave and the endothelial cells, which suggest the onset of the second wave (Figs. 2a, b and 3a) [26]. This structure is not easily observed in other cluster graphs (e.g., PAGA Fig. 3b) or in higher resolution representations (e.g., UMAP, Phate), as evidenced by the comparative visualization analysis presented later (Fig. 8). See Additional file 1: Note S2 for details of these three developmental patterns as shown by the StaVia cluster graph and Atlas View and Additional file 1: Table S3a for a full list of supporting literature.

Next, we compared StaVia with a hybrid pipeline involving PAGA [20], scVelo [27], and CellRank [12], a state-of-the-art method that combines gene–gene feature distances with directional information from RNA velocity for cell fate determination (see Additional file 1: Note S3 for details on the selection of benchmarked methods). Comparing StaVia’s cluster graph with PAGA’s (using CellRank’s initial states and scVelo’s RNA velocity [27]) (Fig. 3b) (see Additional file 1: Table S1 for detailed parameter setting), we observe that the PAGA-scVelo plot is visually difficult to interpret due to edge congestion that cannot easily be minimized. This is due to even conservative attempts of edge thresholding resulting in graph fragmentation. Importantly, the connectivity in the PAGA-scVelo plot misses key biological insights (e.g., lacks dual source of gut formation).

We subsequently compared the lineage probabilities from StaVia and CellRank towards different cell fates (Fig. 3e–f and Additional file 1: Fig. S3c), with StaVia completing the TI computation in 3 min, compared to CellRank’s 20 min (Additional file 1: Note S3). Notably, the single-cell probabilistic lineages in CellRank do not capture the end-to-end pathways from epiblast, through transition states to final cell fates. In most cases for CellRank, the lineage probabilities (Fig. 3f and Additional file 1: Fig. S3 for all lineages) are either very localized to cells at the corresponding final cell fate with no indication of past states (NMP, brain, presomitic/paraxial mesoderm (PSM)) or are very diffuse detouring through the entire landscape (gut, cardiomyocyte) falsely suggesting that unrelated intermediate cells have a high likelihood of differentiating towards these cell fates. The same issues of either blurring or over-localization during pathway prediction are observed in Palantir, with manual setting of cell fates required to bypass incorrect automated predictions in Palantir as demonstrated in Additional file 1: Fig. S3, Fig. S7 and Fig. S19). In contrast, the graph structure presented by StaVia (Fig. 3a) and its LTRW traversal using memory enables us to more unambiguously retrace how these lineages emerge (Fig. 3a, c, e and Additional file 1: Fig. S3).

At higher memory levels, the gene trends predicted for the brain lineage show distinct elevation of Fez1 and Pax6, crucial for neuroectoderm fate specification, neurogenesis, and forebrain patterning (Fig. 3e, Additional file 1: Fig. S4) [28, 29]. StaVia also reveals a noteworthy trend: Sox9 expression precedes Sox10 in neural crest (NC) precursors (Fig. 3d). This aligns with known data showing Sox9’s role in initiating premigratory NC cells, followed by Sox10, which fosters later NC development and cell emigration [30]. This Sox9-Sox10 sequence is not captured by CellRank (Additional file 1: Fig. S3).

Introducing memory in random walks delineates end-to-end pathways in murine gastrulation

We next investigated how the incorporation of memory into random walks improves cell fate mappings and their associated biological interpretation, by addressing the issue of too-localized or too-diffused paths seen in current methods. In the murine gastrulation dataset (and later Zebrahub), we compared the lineage pathways obtained using a first-order and second-order LTRW with varying levels of memory (Mem = 1 (signifies no memory) to Mem = 50). Lower memory values lead to more diffuse pathways on StaVia’s Atlas View (Fig. 4 and Additional file 1: Fig. S4), confounding analysis of temporal gene dynamics. In contrast, higher memory values successfully distinguish adjacent cell fates, as shown by the temporal gene expression of the NMP, paraxial mesoderm, neural, and neural crest cell fates at E8.5 (Fig. 4a-b).

Fig. 4figure 4

StaVia Memory impact on lineage paths. a Increasing memory mitigates too-diffused pathways from epiblast towards specialized cell fates and consequently improves the associated gene trends specificity as shown in b. Gene expression trends along pseudotime for lineages NMP (grey), PSM (green), brain (pink), and neural crest (NC) (peach). Hoxb9 is an NMP marker and we expect the grey NMP gene expression to become comparatively more upregulated than the other three lineages. Similarly, we expect Meox1 as a PSM marker to be comparatively more upregulated

For the NMP lineage, nestled between the emerging PSM and neural cells, higher memory enables the delineation of the sequential cell pathway from the epiblast to caudal epiblast and then to the boundary of the mesodermal and neuronal lineages for the biopotent NMP cell fate (Fig. 4a-iii). In contrast, lower memory values tend to include unrelated cell populations. As a result, gene expression trends for NMP markers Hoxb9 and Nkx1.2 [23] overlap for all these lineages at lower memory values, but at memory = 50, the NMP lineage alone shows distinct expression elevation (Fig. 4b, Fig. S4). The benefits of the memory mechanism are also evident in the E8.5 brain cells (Fig. 4a-ii), where higher memory more accurately shows the brain lineage deriving from the epiblast, followed by cells in the anterior pole of the primitive streak [31].

StaVia displays holistic and high-resolution transcriptomic landscape of the full Zebrahub

We proceeded to leverage StaVia to probe the full Zebrahub, a recent comprehensive scRNA-sequencing time course atlas of 120,000 zebrafish embryonic cells [7] (Fig. 5a). As current methods struggle to reconcile the extended temporal span and extensive cellular information of the entire 10-hpf to 10-dpf (hour/day post-fertilization) dataset, Lange et al., limited their study to the subset of cells (only 30% of the cells in the time-course study), omitting the peridermal and neuroectoderm lineages entirely. We show that StaVia successfully interrogates the complete dataset. Notably, Zebrahub’s neuroectoderm and periderm lineages are analyzed here for the first time with an example of the probabilistic pathway from each of these three layers (see the insets in Fig. 5a).

Fig. 5figure 5

StaVia for Zebrahub. a Atlas View of the entire Zebrahub bud stage to 10 dpf colored by germ layer. Black arrows highlight the direction of differentiation indicated by Atlas edges for major lineages in the mesoderm, neuro-ectoderm, and non-neuro ectoderm. (Insets) StaVia end-to-end paths from bud to cell from mesoderm, neuro-, and non-neural ectoderm show well-delineated pathways as a result of higher-order random walks. b StaVia directed cluster graph using scRNA-velocity and pseudotime colored by main tissue type (top) and known stage (bottom). Edge directions radiate outwards from the center. c scVelo-directed PAGA constructed with a similar number of clusters as StaVia and also using a force-directed layout, shows a congested edge-layout with tissue-specific groups poorly separated and no clear direction. (d) scVelo stream plot on UMAP cannot mark the emergence of lineages as clearly as the edges in the Atlas View. The black arrows trace similar lineages to those highlighted in the Atlas but do not transition through intermediate stages and often show conflicting direction

StaVia outperforms existing state-of-the-art methods, e.g., scVelo, in delineating the intricate cell differentiation trajectories. For instance, StaVia recapitulates that cross-talk between the major lineages (visualized as edges on both the Atlas View (Fig. 5a) and cluster graph (Fig. 5b)) is more prominent during earlier stages (e.g., neuro mesodermal progenitor pluripotent cell types like (NMPs) traverse two germ layers) and diminishes as cells become more specialized. Furthermore, the direction of differentiation is also more clearly captured by StaVia’s cluster graph (Fig. 5b) than CellRank-scVelo-directed PAGA representation (Fig. 5c) and scVelo’s stream plot (bold black arrows in Fig. 5d). StaVia also detects more relevant late-stage cell fates (Figs. 6b and 7b), as well as the gene trends that distinguish these lineages from each other (see Additional file 1: Figs. S7-S9 for lineage comparisons on all detected cell fates, where those missed by CellRank are manually provided to allow comparison), avoiding the pitfalls of missing transition states and inconsistent directionality occurring in the scVelo stream plot (Fig. 5d). Again, StaVia’s TI runtime is fast, taking 4 min, compared to 30 min in CellRank which misses several cell fates (Additional file 1: Fig. S6).

Fig. 6figure 6

Mesoderm development. a Zoom-in of mesodermal lineage highlighting paths to pharyngeal mesoderm and musculature. b Automated predicted differentiation flow of detected mesodermal and hematopoietic cell fates from early mesoderm 10 hpf to 10 dpf. c In StaVia, increasing memory shows clearer paths to cell fate of interest and avoids spillover into unrelated cell types. d Gene trends of mesoderm lineages for dlx4 (pharyngeal mesoderm marker) [32] and matn1 (cranial cartilage marker) [33] are correctly captured by StaVia, whereas CellRank’s lineages are incorrect (e.g., the muscle lineage upregulates matn1 in CellRank). e Correlation matrices for cartilage, muscle, dermis, and erythrocyte lineage pathways at different values of memory 1 to 100 shows stability of analysis when changing memory (Fig. S7 for all fates)

Fig. 7figure 7

StaVia reveals Ectoderm differentiation in Zebrahub. a Zoom-in Atlas View of neurulation from 5-hpf to 10-dpf. “s” denotes somite-stage and “d” is days post fertilization (dpf). b Predicted differentiation flow of non-neural ectoderm and neural fates. cd End-to-end pathways from neural plate region at 10 hpf (5-somite) to c differentiating neurons and d forebrain (5–10 dpf). Accompanying gene-expression trends for neural lineages, shows upregulation of marker genes. e RGC end-to-end pathway and its marker gene expression trends. f Zoom-in Atlas View of non-neural ectoderm regions shows the formation of bilayered epiderm and the differentiation of the placodes and their interactions with associated trigeminal neurons/ganglia. g Pax2a expressed in the early epiderm and placode bipotent regions, Fgf3 restricted to placodes, and Capn9 concentrated on epidermal cells. h StaVia detects that the ionocyte fate (red dot) expresses more Igfbp5ag and Gcm2 than other non-neuro ectoderm lineages

StaVia distinguishes multiple mesodermal pathways in Zebrahub

StaVia uncovers a high-precision mesodermal differentiation flow (Fig. 6a-b) that cannot be recovered from the scVelo and PAGA maps. It accurately predicts that vascular and hematopoietic lineages are derived from the lateral plate mesoderm [34] while revealing that the paraxial mesoderm gives rise to somitic cells, precursors to the dermis and cartilage [35]. Critical to early embryonic development, bipotent NMPs located at the early bifurcation of the mesoderm and neuroectoderm are identified [36] (Figs. 5a and 6a). Again, this ability is attributable to the memory-centric graph traversal implemented in StaVia (see the impact of the memory on mesodermal differentiation analysis in Figs. S10-11).

Notably, StaVia accurately predicts that the pharyngeal arch is derived from the head mesoderm and the cranial neural crest [37, 38] (see the zoom-in Atlas View in Fig. 6a, differentiation flow in Fig. 6b and pathway in Fig. 6c). The upregulation of Dlx genes, as revealed by StaVia, marks the emergence of a pharyngeal population (Fig. 6c–d). This is in contrast to CellRank, which, lacking the memory mechanism, fails to distinguish Dlx expression patterns across mesodermal lineages, resulting in a homogenized expression that masks true cell fate distinctions (Fig. 6d, Additional file 1: Fig. S7 where cell fates missed by CellRank are manually assigned to allow full comparison). Furthermore, StaVia identifies Matrilin Matn as a gene marker to distinguish the cranial cartilage from the pharyngeal arches (Fig. 6d) [33]. This distinction is lost in CellRank (Fig. 6d), which confounds the cartilage with smooth musculature. CellRank’s paths lack intermediate populations (Additional file 1: Fig. S7), showing either fate-localized lineage probabilities or diffuse pathways (Additional file

留言 (0)

沒有登入
gif