Pathobiological signatures of dysbiotic lung injury in pediatric patients undergoing stem cell transplantation

Patients

We enrolled 229 pediatric recipients of HCT across 32 children’s hospitals in the United States, Canada and Australia who underwent 278 clinically indicated BAL between 2014 and 2022 (Fig. 1a,b and Table 1). Pulmonary symptoms developed or worsened a median 93 days after HCT (interquartile range (IQR) = 23–278) and were frequently associated with hypoxia and abnormal chest imaging, often in the setting of comorbidities such as graft-versus-host disease (GVHD) and sepsis. BAL was performed a median 112 days after HCT (IQR = 36–329), at which point lymphopenia was prevalent (median absolute lymphocyte count (ALC) = 420 cells per microliter, IQR = 156–1,035). Based on the BAL results, cases were classified as lower respiratory tract infection, non-pulmonary sepsis or idiopathic pneumonia syndrome (n = 116, 7 and 155, respectively). After each patient’s most recent BAL, 121 of 229 patients required intensive care (53%), 71 required 7 or more days of mechanical ventilation (31%) and 45 died in the hospital (20%).

Fig. 1: Study design and clinical outcomes.figure 1

a, Patients were recruited from 32 participating children’s hospitals in the United States, Canada and Australia. b, Study design diagram. c, BAL processing and analysis workflow. d, Four microbiome–transcriptome clusters were identified. e, In-hospital survival for all patients (left) and the subset requiring respiratory support before testing (right) was plotted according to BAL cluster; differences were analyzed with the log-rank test.

Table 1 Patient characteristicsCluster derivation

BAL underwent bulk RNA sequencing (RNA-seq) followed by parallel alignment to microbial and human reference genomes (Fig. 1c and Methods). Microbial alignments were transformed from counts to quantitative masses using a reference spike-in, followed by stringent contamination subtraction. They were summarized according to taxa, Kyoto Encyclopedia of Genes and Genomes (KEGG) functional orthologs, richness and diversity. Human alignments were characterized according to normalized gene expression, pathway analysis, cell-type deconvolution and T and B cell receptor (BCR) alignments (Methods). To identify the underlying BAL subtypes with shared microbial–human metatranscriptomic composition, we used a two-step unsupervised approach consisting of (1) multi-factor dimensionality reduction (multi-omics factor analysis (MOFA)), followed by (2) uniform manifold approximation and projection (UMAP) with hierarchical clustering (Methods). Optimal fit statistics (Supplementary Fig. 1) suggested that four clusters best fitted the data (Fig. 1d).

Clinical traits, illness severity and outcomes

Clinical data were analyzed after cluster assignment and revealed similar demographics, underlying disease and transplant regimens across clusters, with varying geographical regions and more females in clusters 3 and 4 (Supplementary Table 1). Patients in clusters 3 and 4 were generally sicker, as evidenced by greater need for respiratory support before BAL (P = 0.004), higher rates of renal injury and GVHD (P = 0.001 and P = 0.019), and greater use of intensive care (P = 0.001) or prolonged mechanical ventilation (≥7 days) after BAL (P = 0.001; Supplementary Table 2). Patients in clusters 3 and 4 also had significantly higher in-hospital mortality than patients in cluster 1 or 2 (log-rank P = 0.005; Fig. 1e). Among patients requiring respiratory support before BAL (44%), cluster-based mortality differences were pronounced and ranged from 22% to 30% in clusters 1 and 2 to 50–60% in clusters 3 and 4 (log-rank P = 0.007). Findings were similar when analyzing only patients enrolled within 100 days after HCT (Supplementary Table 3) and in a multivariable Cox regression model accounting for age, biological sex, absolute neutrophil count (ANC), ALC and presence of GVHD (P = 0.023; Supplementary Table 4).

Microbial taxonomy

To determine how microbiome composition drove differences between the clusters, we compared taxonomic mass, richness and diversity. Cluster 1 showed moderate microbiome mass and richness, high microbial diversity and a low burden of viruses. In contrast, cluster 2 showed high mass of bacterial phyla, high taxonomic richness and moderate microbial diversity (Fig. 2a,b and Supplementary Data 1). Cluster 3 demonstrated a reduced quantity and diversity of typically oropharyngeal microorganisms, with greater quantity of RNA viruses and the Ascomycota phylum of fungi, which contains medically relevant pathogens such as Aspergillus, Candida and Pneumocystis. In contrast, cluster 4 showed significant depletion of typical microbiome constituents, with minimal diversity and richness and concomitant enrichment of Staphylococcus and the Pisuviricota phylum of RNA viruses, which contains many respiratory RNA viruses, such as rhinovirus. When analyzed according to survivor status, nonsurvivors showed broad depletion of commensal taxa, higher quantities of fungal and viral RNA (Fig. 2c and Supplementary Data 2) and decreased BAL richness (P = 0.025) and diversity (Simpson’s diversity P = 0.006; Fig. 2d), which is consistent with the description of clusters 3 and 4. In contrast, survivors showed replete and bacterially diverse pulmonary microbiomes, consistent with the description of cluster 1.

Fig. 2: The BAL microbiome.figure 2

a, The fraction (left) and mass (right) of major bacterial, viral and fungal phyla were plotted, with the shading representing the average for each of the four BAL clusters (n = 127, 74, 45 and 32 for clusters 1–4, respectively). The average mass of bacterial genera and species in each of the four BAL clusters are shown on the right. b, Taxonomic richness and diversity were plotted across the four BAL clusters. Richness and diversity varied across clusters (Kruskal–Wallis test, P < 0.001 and P = 0.002, respectively). c, Microorganisms associated with in-hospital mortality were identified using negative binomial generalized linear models (edgeR R package) and were plotted according to the log fold change (position, color) and FDR (dot size). d, Taxonomic richness and Simpson’s alpha diversity stratified according to survival status at the time of the most recent BAL (n = 184 survivors, n = 45 nonsurvivors). Richness and diversity differed according to survival outcome (Wilcoxon rank-sum test, P = 0.025 and P = 0.006, respectively). e, Microbial alignments to the KEGG metabolic pathways were averaged for each BAL cluster. f, Selected metabolic pathways that differed across the BAL clusters are shown. log10-normalized expression varied across clusters (Kruskal–Wallis test, FDR < 0.001 for each of glycolysis/gluconeogenesis, oxidative phosphorylation, fatty acid biosynthesis and butanoate metabolism). For all box plots: the boxes indicate the median and IQR; the whiskers extend to the largest value above the 75th percentile (or smallest value below the 25th percentile), that is, within 1.5 times the IQR.

Microbial function

Transcriptomic markers of metabolic activity of microbial communities may complement taxonomic composition14. Using KEGG functional annotations, cluster 1 showed moderate transcription of myriad microbial metabolic functions across the domains of carbohydrate, lipid and fatty acid, and amino acid metabolism (Fig. 2e,f, Extended Data Fig. 1 and Supplementary Data 3). In contrast, the bacterially rich cluster 2 showed greater transcription of these domains and of glycan biosynthesis pathways, including peptidoglycan, lipopolysaccharide and other glycans that form bacterial cell walls. Cluster 3 showed significantly lower microbial function across the spectrum of the KEGG pathways; consistent with a depleted microbiome, cluster 4 showed minimal microbial metabolic activity. Antimicrobial resistance (AMR) gene expression was highest in the bacterially rich cluster 2 and lowest in the bacterially depleted cluster 4. However, AMR expression normalized to the quantity of BAL bacteria was lower for cluster 2 and highest in cluster 4, suggesting a shifting of bacterial metabolic function (Extended Data Fig. 2).

Pathogen identification

Patients in this cohort had a wide range of distinct infections, thus lending unique elements to each microbiome. Therefore, we next compared the pathogenic microorganisms detected by hospital tests and sequencing (Supplementary Table 5 and Supplementary Data 4).

Viruses

Clinically, most community-acquired respiratory viruses (CRVs) are detected with multiplex PCR and reported as present or absent. Clinical testing found CRVs in 49 samples (18%), whereas sequencing identified CRVs in 77 samples (28%), highest in clusters 2, 3 and 4 (Fig. 3a). In addition to common CRVs, several variant strain CRVs, such as influenza C virus and rhinovirus C, were detected (GenBank: OQ116581, OQ116582, OQ116583). Clinical testing found herpesviruses, including cytomegalovirus and human herpesvirus 6 in 35 samples (13%), whereas sequencing found herpesviruses in 49 samples (16%), with the greatest detection in clusters 3 and 4 (Dunn’s test P = 0.018 and P = 0.021 for clusters 3 and 4 relative to cluster 1). Sequencing also detected many viruses known to have respiratory transmission but not typically included on respiratory viral panels, including BK, WU and KI polyomaviruses, bocavirus, parvovirus B19, lymphocytic choriomeningitis virus and non-vaccine strain rubella across 26 BALs from 23 patients. These viruses were most common in clusters 3 and 4 and associated with 39% in-hospital mortality (n = 9 of 23). The ubiquitous bystander torquetenovirus and its variants were detected in 55 samples (20%), again higher in clusters 2, 3 and 4 relative to cluster 1 (Supplementary Table 6; P < 0.001).

Fig. 3: BAL pathogen detection.figure 3

a, Left: dot plots of common community-transmitted respiratory viruses (left), herpesviruses (middle) and all other viruses (right) detected in the cohort, plotted according to microbial mass (x axis) and microbiome dominance (y axis). Right: bar chart comparing viral detection across the four BAL clusters according to hospital tests and metagenomic sequencing. b, Left: all H. influenzae, S. aureus and S. pneumoniae detected in the cohort were plotted, with the dashed lines indicating the cutoffs of mass ≥10 pg and bacterial dominance ≥20%. Taxa above these cutoffs are shown in the upper-right quadrant (shaded in yellow) to indicate outliers within the cohort. Right: bar chart comparing potentially pathogenic bacteria detected across the four BAL clusters according to hospital tests and metagenomic sequencing. c, Left: all microorganisms detected in the BAL of three patients are shown, with the arrows indicating fungi present in high quantities. Right: bar chart comparing potentially pathogenic eukaryotes detected across the four BAL clusters according to hospital tests and metagenomic sequencing.

Bacteria

Clinically, most pathogenic respiratory bacteria are detected with selective culture media (blood, chocolate and MacConkey agar) optimized to grow certain pathogens above nonpathogenic flora, although PCR, serology and antigen tests may be used for certain organisms. In this study, clinical testing identified pathogenic bacteria in 51 samples, which were heavily overrepresented in the microbially rich cluster 2 (32 of 51 bacterial infections). In contrast, metagenomic sequencing is agnostic to organism pathogenicity and thus detects microorganisms broadly. As contamination is ubiquitous in low-biomass samples15, we used a strict approach to adjust for background taxa using internal spike-ins and batch-specific external controls (Methods). Still, many potentially pathogenic microorganisms were detected broadly; for example, Streptococcus pneumoniae, Moraxella catarrhalis, Haemophilus influenzae, Staphylococcus aureus and Pseudomonas aeruginosa were detected in 34%, 21%, 21%, 16% and 14% of samples (94, 58, 57, 44 and 39 samples), respectively. As some microorganisms could be present as commensals or pathogens depending on context and microbial burden, we then ranked bacteria according to RNA mass, dominance of the bacterial microbiome and intracohort z-score to parse the microorganisms most likely to be present in states of dysbiosis and thus potential infection (Fig. 3b). Using a conservative threshold of RNA mass of 10 pg or greater, bacterial dominance of 20% or greater and z-score of +2 or higher, we found potentially pathogenic bacteria in 76 samples, again with nearly half of these in cluster 2. In addition to new cases of common pathogens (for example, P. aeruginosa), many previously occult pathogens were identified above these thresholds, including Bacillus cereus, Citrobacter freundii, Chlamydia pneumoniae, Klebsiella aerogenes, Salmonella enterica and Ureaplasma parvum.

Eukaryotes

Using clinical assays, potentially pathogenic fungi were detected in 9% of samples (n = 25). As with bacteria, sequencing detected many potentially pathogenic fungi broadly in this cohort, for example, Candida, Aspergillus, Fusarium and Rhizopus were detected in 18%, 16%, 9% and 5% of samples (50, 44, 25 and 13), respectively. Applying a threshold of mass of 10 pg or greater and z-score of +2 or higher, potentially pathogenic fungi were detected in 30% of samples (83), with high detection across clusters 2, 3 and 4 (Fig. 3c). Several relevant fungi were detected exclusively using metagenomic sequencing, including Cryptococcus and Pneumocystis. No BAL parasites were detected through clinical assays, whereas metagenomic sequencing detected Toxoplasma in four patients and Acanthamoeba in three patients, with predominance in clusters 3 and 4 (Supplementary Data 4) and more than 50% mortality rate (n = 4/7).

Overall, clinical testing identified 173 pathogens in 116 of 278 samples (41.7%), while metagenomic sequencing using conservative thresholds identified 360 pathogens in 196 of 278 samples (70.5%, McNemar’s P < 0.001; Supplementary Table 7). Combined clinical testing and metagenomic sequencing identified 429 pathogens in 209 of 278 samples (75.2%; Supplementary Table 5). A total of 90 cases of idiopathic pneumonia syndrome were reclassified as lower respiratory tract infection. Whereas clinical testing identified pathogens in 22 of 45 nonsurvivors (49%), sequencing identified credible pathogens in 36 of 45 nonsurvivors (80%, P = 0.002; Supplementary Table 8). In-hospital mortality was highest for those with a pathogen detected by both clinical testing and metagenomics, and lower if a pathogen was detected by metagenomics alone or was not detected at all (27% versus 19% versus 13%; Supplementary Table 9 and Extended Data Fig. 3).

Impact of antimicrobial exposure

To investigate the impact of antimicrobial exposure on BAL microbiomes, we quantified patient-level antibacterial exposure in the week preceding BAL by weighting the cumulative antibiotic exposure days with an agent-specific broadness score to yield an antibiotic exposure score (AES) (Fig. 4a,b and Methods). AES varied across clusters (P = 0.005) and was lowest for the microbially rich cluster 2 and highest for the microbially depleted clusters 3 and 4. Greater AES was associated with reduced BAL microbial richness (Spearman rho = −0.14, P = 0.018); depletion of all the major bacterial phyla, including many oropharyngeal-resident taxa; and enrichment of the fungal phylum Ascomycota (false discovery rate (FDR) < 0.05; Fig. 4c and Supplementary Data 5). Consistent with expected bacterial depletion, greater preceding AES was associated with lower BAL expression of AMR genes (Poisson regression P < 0.001); however, higher preceding AES was associated with greater BAL expression of AMR genes when normalized to total BAL bacterial mass (Poisson regression P < 0.001). In addition, AES was significantly greater among nonsurvivors (median = 352, IQR = 210–507 versus 175, IQR = 75–336, Wilcoxon rank-sum test P < 0.001; Extended Data Fig. 4). Using causal mediation analysis based on linear structural equation modeling (Methods), the association between greater AES and mortality was statistically mediated by an antibiotic-induced depletion of key commensal pulmonary bacteria including Actinomyces, Fusobacterium, Gemella, Haemophilus, Neisseria, Rothia, Schaalia and Streptococcus (P < 0.001; Supplementary Data 6). However, evidence for mediation was significantly diminished after adjusting models for preceding oxygen support, ANC and ALC (Supplementary Data 7). Similar to above, anti-anaerobic exposure was higher in nonsurvivors (P = 0.011) and was associated with BAL depletion of many anaerobes including Prevotella, Gemella and Fusobacterium (Supplementary Data 8). Antifungal exposure was higher in the microbially depleted cluster 4, driven largely by higher exposure to echinocandins (P = 0.019); antiviral exposure was higher in clusters 3 and 4, driven largely by higher exposure to cidofovir (P = 0.045).

Fig. 4: Antibiotic exposure and impact on BAL microbiome.figure 4

a, Days of antimicrobial exposure are listed for antibacterials (black), antifungals (green) and antivirals (blue). Patients are listed in the columns and the shading indicates the number of days of exposure to each antibiotic in the week preceding BAL. b, AES was calculated before each BAL as the sum of antibiotic exposure days × a broadness weighting factor, summed for all therapies received in the week preceding BAL. AES varied across the clusters (n = 127, 74, 45 and 32 for clusters 1–4, respectively) and was highest for patients in cluster 4 (Kruskal–Wallis test, P = 0.005). For all box plots: the boxes indicate the median and IQR; the whiskers extend to the largest value above the 75th percentile (or the smallest value below the 25th percentile), that is, within 1.5 times the IQR. c, Negative binomial generalized linear models were used to test for BAL microorganisms associated with AES. Microorganisms are listed in the rows, with phyla shown on the left and bacterial genera shown on the right.

Impact of clinical immune status

The pulmonary microbiome exists in a state of reciprocal interaction with the lung epithelium, stroma and immune system. Analysis of patient immune laboratory tests showed that ANC was highest in the bacterially rich cluster 2 (P = 0.029; Supplementary Table 2) but was not associated with mortality overall (P = 0.810). In contrast, ALC did not vary across clusters (P = 0.997) but was lower in nonsurvivors (median = 273 cells per microliter, IQR = 125–650 versus 422, IQR = 179–1120, P = 0.028).

Pulmonary gene expression

We then compared BAL human gene expression across the four clusters and identified 18,158 differentially expressed genes (DEGs) (Fig. 5a and Supplementary Data 9). Select genes most differentially expressed in each cluster are displayed in Fig. 5b. Using REACTOME gene set enrichment scores (Supplementary Data 10), we showed that clusters were differentiated by high expression of pathways related to antigen-presenting cell activation (cluster 1); neutrophil and innate immune activation, bacterial processing and airway inflammation (cluster 2); collagen deposition and fibroproliferation (cluster 3); and antiviral and cellular injury genes (cluster 4; Fig. 5c). To replicate these findings orthogonally, we identified 1,253 genes differentially expressed between survivors and nonsurvivors (Supplementary Data 11). Consistent with the description of clusters 3 and 4, nonsurvivors showed broad downregulation of innate immune and antigen-presenting signals and a significant upregulation in collagen deposition, matrix metalloproteinases, alveolar epithelial hyperplasia and fibroproliferative genes (for example, COL1A1, COL3A1, CXCL5, IL13, MMP7, SFTPA1, SFTPC and TIMP3, each detected at an FDR < 0.05).

Fig. 5: BAL gene expression.figure 5

a, DEGs were identified using a four-way analysis of variance-like analysis with negative binomial generalized linear models. Mean normalized expression levels for significant genes are displayed for the four BAL clusters. b, Individual DEGs were identified across the four clusters (edgeR R package); variance-stabilized transformed gene counts for select genes highest in each of the four clusters were plotted (n = 127, 74, 45 and 32 for clusters 1–4, respectively). For all box plots: boxes indicate the median and IQR; the whiskers extend to the largest value above the 75th percentile (or smallest value below the 25th percentile), that is, within 1.5 times the IQR. c, Gene set enrichment scores to REACTOME pathways were calculated and example gene sets most enriched in each of the four clusters are shown.

BAL cell-type imputation

BAL contains an admixture of cell types in contact with the lumen of the lower respiratory tract; thus, varying cell proportions or activity levels may account for differential gene expression detected by bulk sequencing. Using cell-type deconvolution, we showed that clusters were differentiated by high fractions of monocytes and macrophages (cluster 1), neutrophils (cluster 2), CD4+ T cells (cluster 3) and CD8+ T cells (cluster 4) (Methods and Extended Data Fig. 5a,b). To assess differences in cell-type-specific gene expression, we next imputed monocyte-specific expression of the Gene Ontology Biological Process (GOBP) ‘Myeloid Leukocyte Activation’ gene set (including CSF1, IFNGR1, LDLR, TLR1 and TNF) and found the highest expression in clusters 2, 3 and 4 (Methods and Extended Data Fig. 5c). Although cluster 1 had a high monocyte and macrophage cell fraction, lineage-specific inflammatory gene activation was relatively low in this cluster. Similarly, lymphocyte-specific expression of the GOBP ‘Lymphocyte Activation’ gene set (including AKT1, BTK, CD4, DOCK8, JAK2 and IL7R) was highest in clusters 3 and 4 (Extended Data Fig. 5d). We then used ImReP to measure lymphocyte receptor repertoires across the clusters, which showed that most CDR3 alignments were for T cell receptor-α (TCRα), with many fewer alignments to β, γ and δ and BCR H, K, or L. Whereas the virally enriched cluster 4 showed the highest number of TCRα clonotypes and diversity, cluster 1 showed the lowest (Extended Data Fig. 6). Notably, BAL TCRαβ clonotype numbers and diversity were not correlated with blood ALC (P = 0.646), although BAL TCRγδ and BCR subtypes were higher in patients with higher blood ALC (P = 0.041 and P = 0.006, respectively).

Cluster transitions

We next assessed whether original cluster assignments were stable over time. After the first BAL, 34 patients underwent an additional 1 or more BALs separated by a median of 79 days (IQR = 21–243) due to worsening lung disease or concern for a new pulmonary process. Most patients who started in the low-risk cluster 1 moved out of cluster 1 (17 of 26) to a higher-risk cluster; patients who started outside cluster 1 rarely moved into cluster 1 (8 of 49), driving an overall change in the cluster burden over time (P < 0.001; Extended Data Fig. 7 and Supplementary Tables 10 and 11). This suggests that, for patients with recurrent or non-resolving symptoms, progression to an adverse BAL phenotype is common.

Classification model and external cluster validation

Finally, as cluster assignments cannot be directly applied to external cohorts, we used taxonomic and gene expression data to grow a random forest of 10,000 trees to be used as a cluster classifier. The out-of-bag area under the curve (AUC) was 0.923, indicating good cluster discrimination (Supplementary Table 12). Lung gene expression variables were significantly more important to cluster classification than taxonomic variables, with the 500 most important genes showing significant enrichment for immune processes (Supplementary Data 12). The random forest classifier was then applied to taxonomic and gene expression data from an independent cohort of n = 57 BALs obtained from pediatric recipients of HCT at the University Medical Center in Utrecht, the Netherlands, between 2005 and 2016 (clinical traits are described in Supplementary Table 13). Although this cohort differed in geography, underlying diseases, allograft characteristics and treatment protocols, 1-year non-relapse mortality was lowest among patients with BALs assigned to the low-risk cluster 1 (9.5%, 2 of 21), was higher for patients assigned to the bacterially rich cluster 2 (36%, 4 of 11), and was highest for patients in the high-risk clusters 3 or 4 (52%, 13 of 25, P = 0.009; Extended Data Fig. 8 and Supplementary Table 14), thus confirming the external validity and clinical significance of the BAL cluster profiles.

留言 (0)

沒有登入
gif