Single-cell transcriptomics of pediatric Burkitt lymphoma reveals intra-tumor heterogeneity and markers of therapy resistance

Heterogeneity of the single-cell transcriptome of BL tumors

We performed paired sc-transcriptomic and immunoglobulin repertoire analyses on 12 sporadic EBV-negative BL specimens (11 patients), including 8 pleural or abdominal effusions (E) and 4 nodal tumor masses (N), which were collected at diagnosis and frozen as viable single cell suspensions (Supplementary Fig. 1A). All patients were treated with the same therapeutic protocol (AIEOP LNH-97) [28]. The majority (8/11) responded to therapy (responders, R) and are currently in lasting remission (≥3 years). The non-responder (NR) patients experienced disease progression (BL102 and BL107) or early relapse (BL103, relapse 2 weeks after remission) and died of the disease.

Upon data quality filtering, we obtained 36,400 cells, including 21,649 tumor cells identified based on the presence of clonal V(D)J rearrangements, and 14,751 tumor-infiltrating normal cells. Each specimen displayed variable number of cells (range 838–6804) and fractions of tumor (median 62%) and non-tumor (median 38%) cells (Supplementary Fig. 1B).

V(D)J analysis identified the tumor clone carrying a clonally rearranged B cell receptor (BCR) in each patient (Supplementary Table 3). Consistent with previous reports suggesting high frequency of BCR rearrangements involving the IGHV3 genes [22, 31], this V gene family was observed in 7/11 tumors with recurrent usage of IGHV3-23 and IGHV3-33. The lambda light chain locus was rearranged in 7/11 patients with recurrent usage of the IGLV2-14 and IGLV1-51 V genes. Cytofluorimetric data on surface expression of kappa/lambda chains were available for the diagnostic specimen of 5/11 patients and were concordant with the chains identified by sc-RNAseq analysis. Most patients (10/11) expressed exclusively unswitched IGHM and IGHD isotypes (Supplementary Table 3).

Analysis of tumor cells from our BL dataset in comparison with published sc-transcriptional profiles of normal GC B cells [8], and GC-derived lymphomas including follicular lymphoma (FL), transformed FL (tFL) and diffuse large B cell lymphoma (DLBCL) [32, 33], confirmed that these tumor types are distinct and segregate separately from normal GC B cells (Fig. 1A).

Fig. 1: Heterogeneity in the tumor cell transcriptome of GC-derived lymphomas.figure 1

UMAP projections of sc-transcriptomic profiles of: A normal GC B cells and tumor cells isolated from BL, follicular lymphoma (FL), transformed-FL (tFL), and diffuse large B cell lymphoma (DLBCL), including both GCB and ABC subtypes [8, 32, 33]; B 21,649 BL tumor cells isolated from 12 diagnostic specimens (11 patients); C 14,751 normal tumor-infiltrating cells isolated from the same BL specimens. Immunoglobulin gene transcripts were excluded. E Effusion, N nodal; CAFs cancer-associated fibroblasts.

Individual tumor cases were distinguishable based on transcriptional features of the malignant cells in all tumor types, including BL (Fig. 1B and Supplementary Fig. 1C). Conversely, the same analysis performed on the non-tumor cells in the BL specimens (n = 14,751) revealed that cells from different patients intermingled and clustered by cell type rather than by specimen (Fig. 1C). These observations indicate that differences across patients are mostly driven by transcriptional features associated with the tumor cells.

To investigate the inter-tumor features driving the patient-specific BL profiles, we first identified the most differentially expressed genes in the tumor cells of each patient when compared to all the others. Then we performed pathway enrichment analysis on the top 100 upregulated genes in each patient using the SignatureDB database. The results showed that some transcriptional programs including those modulated by MYC and BCL6 are largely shared across patients, although each patient may display a unique subset of targets (Supplementary Fig. 1D and Supplementary Table 4). Other programs, including TCF3 targets, genes affected by signaling pathways (i.e. PI3K, CD40, interferon) or associated with proliferation were enriched in subsets of patients (Supplementary Fig. 1D and Supplementary Table 4). These results suggest that the patient-specific signatures are driven by unique transcriptional features, some of which however converge into the same pathways.

Tumor-infiltrating normal cells comprise mostly immune cells

The analysis of non-tumor cells in the BL specimens revealed mostly immune cells including distinct clusters of naïve and effector T cells, myeloid cells, and B cells (Fig. 2A). In addition, a small fraction of fibroblasts (cancer-associated fibroblasts, CAFs) was detected (Fig. 2A). The identity of the subpopulations was determined based on the expression of specific markers and confirmed by assessing enrichment in the LM22 signatures using CIBERSORT and the Mann-Whitney-Wilcoxon Gene Set test (MWW-GST) [34] (Fig. 2A). Tumor-infiltrating normal cells were found in all specimens, although with different representation (Fig. 2B). Some subpopulations, including naïve T cells and B cells, were evenly represented across samples, while others showed a bias based on specimen type (effusion, E, vs nodal, N) or prognosis (Fig. 2C). T cells, particularly effector T cells, were more abundant in the N specimens, while myeloid cells represented a larger fraction of infiltrating cells in the E samples (Fig. 2C). Although the number of CAFs was modest, a significant enrichment (p < 0.05, Mann-Whitney U Test) was detected in the specimens from NR compared to R patients (Fig. 2C).

Fig. 2: Tumor-infiltrating normal cells include multiple immune cell types.figure 2

A UMAP projection of sc-transcriptomic profiles of 14,751 normal tumor-infiltrating cells, labeled based on the identified populations. Relative gene expression is displayed as UMAP/heatmap with colors representing the z-scored log2 normalized expression. B UMAP projections of normal tumor-infiltrating cells, as shown in (A), color-coded based on the cells contributed by each specimen. C Box and Whisker plots displaying the distribution of normal tumor-infiltrating cells in the dataset stratified by sample origin (E effusion, N nodal) and response to therapy (non-responders, NR, and responders, R). A Mann-Whitney U Test was used to compare data sets pairwise (*p < 0.05).

A refined analysis focusing on the T cell compartment increased the resolution and allowed the detection of natural killer and regulatory T cells, in addition to naïve and effector T cells (Supplementary Figure 2A). This analysis highlighted that most T cells detected in the E specimens were naïve T cells, while effector T cells were significantly enriched in the N specimens (Supplementary Fig. 2B). No significant distribution differences were detected in the natural killer and regulatory T cell compartments (Supplementary Fig. 2B). A subset of T effector cells expressed transcripts of exhaustion markers, including LAG3, HAVCR2 (TIM3), and PDCD1 (PD1), while they lacked expression of TCF7 (Supplementary Fig. 2C). By applying a previously reported exhaustion scoring approach [35], we scored as exhausted 38% of the overall Teff cells and 19% of the proliferating Teff cells, mostly in the N specimens (Supplementary Fig. 2D, E).

The myeloid component showed some heterogeneity that was captured by an analysis focused on these cells (Supplementary Fig. 3A). We identified six clusters which displayed expression of markers previously reported to be associated with multiple myeloid subpopulations including classical tumor-infiltrating monocytes, inflammatory and regulatory tumor-associated macrophages, and conventional CD1c+ dendritic cells (Supplementary Fig. 3B) [36]. A subpopulation with monocytic features and a subpopulation of regulatory tumor-associated macrophages were shown to be respectively depleted and enriched in the N specimens (Supplementary Fig. 3C).

Although several differences seem to be driven by the biospecimen types (N vs E), these data suggest that some cellular components (i.e. CAFs) may be relevant in defining biological features associated with disease outcome (see Discussion).

BL intra-tumor heterogeneity reflects similarities with distinct normal GC subpopulations

Analysis of the BL tumor sc-transcriptomes revealed distinct clusters in each patient, mostly driven by the heterogeneous expression of cell cycle markers and molecules involved in the BCR signaling pathway and NF-κB activation (Supplementary Figure 4). These features were recurrent across patients and the analysis of the merged dataset confirmed the presence of several transcriptional programs that were shared across multiple patients (Fig. 3A). As expected, expression of the MYC oncogene, as well as B cell and GC B cell markers was quite uniform across all cells (Fig. 3B, top). Conversely, cell-cycle markers (including PCNA, MKI67, CDK1, CDC20) discriminated clusters of cells with a transcriptome consistent with active cell division (Fig. 3B, middle) from cells expressing higher levels of transcripts related to B cell activation, BCR (PTPN6, CD72) and NF-κB (NFKB1, IRF4) pathways (Fig. 3B, bottom). Pathway enrichment analysis of the top 100 genes upregulated in each cluster using the KEGG and Hallmark databases confirmed enrichment for genes promoting cell-cycle progression (Clusters 1, 4, and 7), antigen presentation (Clusters 2, and 9), and multiple signaling pathways including MAPK, NF-κB, interleukins and interferon (Clusters 2, 3, 9, 10, and 11) (Fig. 3C and Supplementary Table 5).

Fig. 3: BL intra-tumor heterogeneity reflects similarities with distinct normal GC subpopulations.figure 3

A UMAP projection and cluster identification using sc-transcriptomic profiles of 21,649 BL tumor cells. B Relative gene expression displayed as UMAP/heatmap with colors representing the z-scored log2 normalized expression. C Pathway enrichment analysis for the gene signatures (top 100 upregulated) associated with the clusters identified in (A). Relevant pathways from KEGG (KG) and Hallmark (HM) databases that are significantly enriched (hypergeometric test with Benjamini-Hochberg correction, q < 0.05) are shown in gray. D UMAP projection of BL tumor cells as displayed in (A) and colored based on the highest correlation of each cluster with previously reported signatures of normal GC B cell subpopulations [9]. DZ dark zone, INT intermediate, LZ light zone, PreM memory precursors. E Heat map displaying a subset of differentially expressed genes in the subgroups identified in (D). The color bars on the left indicate genes associated with the DZ (blue), LZ (red) or PreM (yellow) signatures. The size of the dot indicates the percentage of cells with detectable expression, and the color shows the z-scored average (log2) normalized expression within a group. Box and Whisker plots displaying the distribution across the GC-related subgroups of BL tumor cells stratified by: F sample origin (E effusion, N nodal); G response to therapy (NR non-responders, R responders). A Mann-Whitney U Test was used to compare data sets pairwise (*p < 0.05).

Cells were distributed across clusters with no significant bias based on cellular representation and tumor specimen source (E vs N samples) (Supplementary Fig. 5A, B). Regarding patient outcome, significant depletion in the NR specimens was detected for one of the smallest clusters (cluster 10) that was associated with multiple signaling pathways, including MAPK and NF-κB (Supplementary Fig. 5C).

In order to investigate the relationship between BL cells and normal GC B cells, we measured the correlation between the gene signatures associated with the BL clusters identified here and those of normal GC B cell subpopulations that we have previously reported [9]. This approach provides a score of similarity (or dissimilarity) for each tested signature, and each population was annotated based on the best enrichment score. Bulk transcriptomic analysis of BL specimens displayed significant correlation with dark zone (DZ) GC B cells. However, at the single cell level we also identified groups of tumor cells carrying features of GC light zone (LZ), intermediate (INT) and memory B cell precursors (PreM) (Fig. 3D). Several markers that are associated with these GC B cell subpopulations displayed increased expression in distinct clusters of BL cells (Fig. 3E). BL cells resembling DZ or INT cells closer to the DZ (INT-DZ) represented about 50% of the tumor cells, while the remaining were similarly distributed between LZ-b (representing the late LZ stages), INT-LZ, and PreM groups (Fig. 3F). Conversely, cells carrying the LZ-a signature (associated with the early LZ stages) were unevenly distributed across patients and significantly enriched only in a minority of patients (3/11), suggesting that they represent a minor component of the tumor population in most cases. Of note, cells displaying the LZ-a signature were significantly depleted in the NR specimens (Fig. 3G). To identify potential state transitions, we performed pseudo-temporal analysis and showed that these different states are predicted not to be fixed, rather cells may change their transcriptional state following specific patterns (Supplementary Fig. 5D).

In conclusion, the transcriptional heterogeneity observed in BL tumor cells can be annotated along the diverse states of GC B cells suggesting that BL cells recapitulate some of the features associated with normal GC developmental stages.

Identification of genes differentially expressed at diagnosis and correlating with therapy response

Toward the identification of markers with the potential to discriminate the NR patients at diagnosis, we performed differential expression analysis on the sc-transcriptomic data from the tumor cells of NR versus R specimens. This analysis identified 1,739 upregulated and 216 downregulated genes with a fold change >1.5 (Supplementary Table 6). The top upregulated genes in the NR group included several cytoskeleton-related transcripts (TPM2, PCDH9, PDLIM3, and MAP1B) and SOX11 (Fig. 4A). Previous studies have shown that SOX11 is variably expressed in BL and that higher nuclear protein expression correlates with worse prognosis in adult cases [37,38,39]. However, its role as a prognostic feature in pediatric BL has not been explored. A number of NR-upregulated genes were involved in diverse signaling pathways, associated with BCR, NOTCH, TGF-β, and interferon (Fig. 4A). The genes upregulated in the R specimens were enriched for cell cycle genes, and for markers associated with the LZ and PreM signatures (Fig. 4A).

Fig. 4: Identification of transcriptional features related to response to therapy.figure 4

A Heat map displaying a subset of differentially expressed genes in sc-transcriptomic profiles of tumor cells from non-responders (NR) and responders (R) to therapy. The size of the dot indicates the percentage of cells with detectable expression, and the color shows the z-scored average (log2) normalized expression within a group. B Box and Whisker plots displaying the expression fold change of selected genes in diagnostic specimens of NR versus R, as detected by qRT-PCR. A Mann-Whitney U Test was used to compare data sets pairwise (*p < 0.05).

A few candidates including genes with a relevant function in B cells (MYB, BTK, CD72), SOX11, and tropomyosin 2 (TPM2), a member of the tropomyosin actin filament binding protein family [40] that was the most upregulated gene in the NR group (Fig. 4A), were selected for further validation in an extended cohort of patients. Quantitative RT-PCR on diagnostic specimens from 22 R and 17 NR showed significantly higher expression of TPM2 in the NR compared to the R group (p < 0.05). All other tested candidates displayed a trend toward higher expression in the NR (Fig. 4B). Although the validation panel included unpurified N biopsies and E samples, no significant expression differences were observed considering the specimen types, confirming that the sample origin was not a variable significantly affecting this analysis.

The expression of the top performing candidate (TPM2) was assessed in a further extended panel including a total of 36 R and 21 NR (Fig. 5A). Patient stratification based on qRT-PCR in the diagnostic specimens indicated that high TPM2 transcript expression (above the median expression in the dataset) was significantly associated with poor prognosis (Fig. 5B and Supplementary Table 7). In addition, we performed mutational analysis for TP53 in the same cohort and showed that TPM2 expression significantly associated with progression-free survival even in the high-risk subset of patients carrying TP53 mutations (Fig. 5C and Supplementary Table 7). All patients included in these analyses were uniformly diagnosed and treated in Italy (AIEOP LNH-97) [28].

Fig. 5: TPM2 is a prognostic biomarker in pediatric EBV-negative BL.figure 5

A Box and Whisker plot displaying TPM2 expression fold change in diagnostic specimens of 21 non-responder (NR) and 36 responder (R) patients, as detected by qRT-PCR. (*p < 0.05 by Mann-Whitney U Test). B Kaplan-Meier plot for progression free survival (PFS) analysis in the BL patients (n = 57) stratified based on expression of TPM2 transcript, as detected by qRT-PCR in (A). “TPM2 low” and “TPM2 high” are patients with TPM2 expression below and above the median expression in the dataset, respectively. C Kaplan-Meier plot for PFS analysis in the subset of patients (n = 35) carrying mutated TP53 and stratified based on expression of TPM2 transcript, as detected by qRT-PCR in (A). D Bar plot displaying the percentage of cases which scored as positive (TPM2-pos) or negative (TPM2-neg) for TPM2 protein expression, as detected by IHC analysis in 11 NR and 36 R patients. (Right) Representative images of TPM2 detection by IHC in BL nodal diagnostic biopsies from a R (R37) and a NR (NR19). TPM2 expression is detectable in the tumor cells of the NR and in the normal muscle, stroma, and macrophages of all specimens. E Kaplan-Meier plot for PFS analysis in BL patients (n = 47) stratified based on TPM2 protein expression in the tumor cells.

In order to confirm expression in the tumor cells, we tested TPM2 protein by IHC in 47 BL patients (10 of which were analyzed also by sc-RNAseq and/or qRT-PCR) using diagnostic tissue samples collected at multiple Institutions in Europe, USA, and Canada. The antibody reactivity and specificity were validated by immunoblotting and IHC in HEK-293T cells transfected with a plasmid expressing an HA-tagged TPM2 protein (Supplementary Fig. 6A, B). Normal GC B cells from reactive lymphoid tissue showed no or barely detectable TPM2 expression, while expression was observed in follicular dendritic cells, macrophages and, as expected, in stromal cells, and in smooth and striated muscle cells (Supplementary Fig. 6C). Analysis of the primary BL specimens showed cytoplasmic TPM2 expression in the tumor cells of 8/11 (73%) NR but only 5/36 (14%) R patients, showing a significant association between this marker and resistance to therapy (p = 0.0002 by Fisher Exact test) (Fig. 5D and Supplementary Fig. 6D). In the “positive” cases, TPM2 was expressed in the large majority of tumor cells with a signal intensity varying from weak to strong across cases (Supplementary Fig. 6D). There was perfect concordance between the protein and RNA expression as detected by IHC and by sc-RNAseq, while detection by qRT-PCR and IHC was discordant in three cases. Of note, the single TPM2-positive case in the R group (BL101) for which sc-transcriptomics detected TPM2 RNA expression in the tumor cells, indeed displayed TPM2 protein staining in the tumor cells by IHC (Supplementary Fig. 6D). Progression-free survival analysis performed by stratifying the patients based on TPM2 protein expression confirmed the significant prognostic value of this biomarker (Fig. 5E and Supplementary Table 7). Overall, these results identify TPM2 as a potential prognostic biomarker to stratify patients at diagnosis by qRT-PCR or IHC.

留言 (0)

沒有登入
gif