Strain-specific impacts of probiotics are a significant driver of gut microbiome development in very preterm infants

Cohort and study design

The present study included 123 preterm infants born at <32 weeks’ gestation without congenital anomaly, early onset sepsis, LOS, NEC, focal intestinal perforation or other intestinal pathology. These morbidities were excluded because they are most strongly associated with changes in the gut microbiome. No statistical methods were used to predetermine sample size, but our sample size is larger than those reported in previous publications. As an observational, retrospective study, no subject randomization was required. Data collection and analysis were not performed blind to the conditions of the experiments.

Infants were recruited to the Supporting Enhanced Research in Vulnerable Infants Study (SERVIS) with written parental consent covering data and sample collection. The study protocol was approved by Newcastle Hospitals NHS Foundation Trust, NRES Committee North East and N. Tyneside 2 10/H0908/39, and the research complies with all relevant ethical regulations. All work with clinical samples, including organoids, is covered within these ethical approvals.

All infants were cared for in the NICU of the Royal Victoria Infirmary, Newcastle, with standardized feeding and antibiotic and antifungal guidelines (prophylactic fluconazole). The earliest included infants were born in 2011 and probiotics were introduced into routine use in 2013. Between 2013 and 2016, infants received the probiotic Infloran (B. bifidum 1 × 109 c.f.u. and L. acidophilus 1 × 109 c.f.u.); then, due to lack of availability, after mid-2016 Labinic (B. bifidum 0.67 × 109 c.f.u., B. longum subsp. infantis 0.67 × 109 c.f.u. and L. acidophilus 0.67 × 109 c.f.u.) was used. Stool samples used in the analysis were collected longitudinally (n = 1,431) from day 0 until day 120, alongside extensive clinical metadata for each infant, including demographics and treatments such as feed exposures.

Variables that are fixed through time (for example, gestational age, birth mode, sex) are described on a per-infant basis and are thus constant for all samples from a given infant. Other variables were categorized to reflect exposure in relation to time (for example, antibiotics, receipt of MOM), and therefore are on a per-sample basis. Specifically, the clinical variables used were gestational age at birth (continuous; range 23–31), birthweight (continuous; range 500–2,000 g), birth mode (vaginal/caesarean), sex (male/female), season at birth (winter/spring/summer/autumn), intravenous antibiotics in the past 7 d (no/yes), day of full feed (continuous; range 6–39), MOM (never/during/after), BMF (never/before/during/after), formula (never/before/during/after), probiotics (no probiotic/Infloran/Labinic) and weight z-score difference between birth and discharge (continuous; range −5.4 to 1.1).

For the persistence analysis (described in more detail in Statistical analysis), all co-variates needed to be on a per-infant basis, restricting this analysis to gestational age, birthweight, birth mode, sex, season, total number of antibiotic courses, day of full feed, BMF ever (no/yes), formula ever (no/yes), probiotics and weight z-score change. MOM could not be included in this particular analysis because there was only one baby who did not receive MOM in this subset.

Metagenomic shotgun sequencing, taxonomic and functional profiling

DNA was extracted from ~0.1 g of stool using the DNeasy PowerSoil Kit (QIAGEN) following the manufacturer’s protocol, and sequencing was performed on the HiSeq X Ten (Illumina) with a read length of 150-bp paired-end reads. Taxonomic profiling of metagenomic samples was performed using MetaPhlAn v.2.0 (ref. 44) (bacterial, archaeal and fungal taxonomic classification) and VirMAP v.1.0 (ref. 45) (viral taxonomic classification) based on default settings. Functional profiling was performed using HUMAnN v.2.0 (ref. 46) based on default settings. Microbial enzymes (level-4 EC categories) were quantified by summing the abundances of individual gene families mapping to each EC number based on UniRef90-EC mapping from UniProt47.

Untargeted metabolomics on stool and serum samples

A subset of ten stool samples representative of each PGCT and matched serum were selected for untargeted liquid chromatography–mass spectrometry (LC–MS). As PGCTs were strongly associated with DOL at sampling, samples were primarily chosen to match for DOL between PGCTs to mitigate confounding by age at sampling. Other clinical variables were matched in addition, including gestational age, birthweight, birth mode and sex. Based on these criteria, no clinical variable was significantly different between PGCTs (all P > 0.05; Supplementary Table 5). Metabolites from these samples were extracted using a methanol solvent solution, supplemented with 1 µM MS internal standards (CAPS, CHAPS and PIPES) and 5 µM 2,6-di-tert-butyl-4-methylphenol. Serum samples were centrifuged at 800g for 5 min, supernatants were collected and the solvent solution was added at a 4:1 ratio. Samples were shaken for 1 h at 4 °C, centrifuged at 14,000g for 10 min and supernatants collected. Liquid from stool samples was evaporated using a Speedvac (Thermo Fisher Scientific) and a solvent solution was added at a ratio of 300 µl per 10 µg. Samples were shaken for 1 h at 4 °C, followed by centrifugation at 14,000g for 20 min, and supernatants were collected.

The LC–MS data were acquired on a Dionex Ultimate 3000 RS high-performance liquid chromatography system (Thermo Fisher Scientific) coupled with a Q-Exactive Orbitrap mass spectrometer (Thermo Fisher Scientific). Chromatographic separation was performed on a ZIC-pHILIC column (5 µm, polymeric, 150 × 4.6 mm2, SeQuant, Merck). Mobile phase (A) was 20 mM ammonium carbonate and (B) acetonitrile. The gradient programme started at 80% (B) and reduced to 50% (B) over 15 min, then reduced to 5% (B) over 3 min, where washing occurred for 3 min; finally there was an 80% (B) re-equilibration for 8 min. The flow rate was 0.3 ml min−1 and the column compartment temperature was 40 °C. Total run time was 32 min with an injection sample volume of 10 µl. The mass spectrometer operated in positive and negative polarity, switching at 35,000 resolution and 200 m/z with detection range of 85–1,275 m/z in full-scan mode. An electrospray ionization source (HESI) was set to 3.5 kV voltage for positive mode and 4.0 kV for negative mode, sheath gas was set to 50 and aux gas to 20 arbitrary units, capillary temperature 300 °C and probe heater temperature 120 °C. Serum samples were analysed as a single batch, as were stool samples. Each sample set was randomized to account for system drift. Mixtures of pure authentic standards containing approximately 320 metabolites were acquired as separate injections and used to confirm retention times.

The raw LC–MS data of both serum and stool samples were independently processed as stated in the metabolome–lipidome–MS-DIAL pipeline (Code availability) using MS-DIAL v.4.8 (ref. 48). Metabolomic processing was conducted in positive and negative ion mode. Default parameters were applied unless otherwise stated. Peak detection parameters included a minimum peak amplitude of 100,000. Peaks were identified using the MassBank database v.2021.02 (ref. 49) with a retention time tolerance (RTT) of 0.1 min, accurate mass tolerance (AMT) of 0.002 Da and identification score cut-off of 80%. Peaks were aligned using an RTT of 0.3 min and AMT of 0.002 Da, with gap filling by compulsion. MS/MS was exported and further processed for secondary annotation using the Global Natural Products Social Molecular Networking feature-based molecular networking tool50.

Peak intensity tables were exported from MS-DIAL and the R package pmp v.1.6.0 (ref. 51) was used for the following QC and pre-processing steps. Peaks were filtered for intensities at least fivefold higher than LC–MS blanks, samples with >80% missing values, features with >20% missing values and peaks filtered based on the percentage of variation in the QC samples with a maximum relative s.d. of 25%. Based on this, one stool sample from PGCT-2 was excluded. Data were normalized using probabilistic quotient normalization, followed by Random Forest missing data imputation using the missForest R package v.1.4 (ref. 52) and subsequent generalized logarithmic (glog) transformation. MS1 data were further annotated using the human metabolome database (HMDB, v.4, July 2021)53, with an AMT of 0.002 Da. Any unannotated features were removed. The remaining dataset was subject to manual feature curation in MS-DIAL, where poor quality spectral features were removed.

Preterm intestinal organoid co-culture

A human intestinal organoid line was generated from preterm intestinal ileum tissue after surgical resection for NEC54. The infant was a boy born at 24 weeks’ gestation and had surgery on DOL 10.

Intestinal organoids (n = 3 technical replicates) were exposed to pooled faecal supernatants representing each PGCT and a control containing no faecal supernatant. Sterile faecal supernatants were prepared using a modified method described elsewhere55. Briefly, ~0.25 g of stool (n = 10) was pooled for each PGCT and diluted in 25% (w/v) sterile phosphate-buffered saline before being vortexed for 20 min with glass beads. Faecal slurries were centrifuged for 20 min at 1,600g and 4 °C, the supernatant was re-centrifuged for 10 min at 14,000g and 4 °C, and the resulting supernatant was serially filtered (0.45 µm and 0.22 µm). Faecal supernatant was stored at −80 °C until use.

Intestinal organoids were seeded as monolayers on 0.4 µm Transwells (Corning) and, after reaching confluence (~2 d), were differentiated for 4 d56.

Co-culture of preterm intestinal organoid monolayers with sterile faecal supernatants was performed for 24 h using the organoid anaerobe co-culture (OACC) model57. The sterile faecal supernatants were added apically, corresponding to the intestinal lumen. The OACC model was used to recapitulate the steep oxygen gradient across the epithelium and mimic the low oxygen gradient of the ileum. TER was measured at the end of the experiment to confirm that all monolayers remained intact and cells were contiguous.

RNA-seq

After 24 h of exposure, RNA was extracted from organoid monolayers using the RNeasy kit (QIAGEN) before undergoing RNA-sequencing (RNA-seq) at the Newcastle University Genomics Core Facility. One sample from the PGCT-5 exposure failed QC and was not included in the subsequent analysis. Briefly, stranded messenger RNA-seq libraries were prepared using the TruSeq Stranded mRNA kit (Illumina) and IDT for Illumina TruSeq RNA UD Index adapters following the manufacturer’s protocol. Libraries were quantified using a TapeStation 4200 (Agilent Technologies) and Qubit 4 (Thermo Fisher Scientific) and equimolar pooled. The pooled library was sequenced at ~50 million 100 bp single-reads per sample on a NovaSeq 6000 using an S2 100 cycle flow cell (Illumina). Data for individual samples were demultiplexed into separate FASTQ files using Illumina’s bcl2fastq software.

QC of raw reads was performed using fastq_quality_trimmer from the FASTX Toolkit v.0.0.14 before being mapped to the human transcriptome (GRCh38.p13) using Salmon v.0.13.1 (ref. 58) to estimate transcript abundance. Estimated count data were aggregated at the gene level by tximport59 for downstream analysis. DESeq2 v.1.32.0 (ref. 60) was used to normalize RNA-seq count data and identify DEGs between PGCT and control replicates. Genes were considered differentially expressed if they displayed an absolute positive or negative fold-change of ≥1.5 and a false discovery rate (FDR)-adjusted P < 0.05. A Venn diagram of DEGs was produced using the VennDiagram package v.1.7.1 (https://cran.r-project.org/web/packages/VennDiagram/index.html).

Statistical analysis

All statistical analyses were performed in R v.4.0.2 (https://www.r-project.org). Unless stated otherwise, all visualizations were plotted using the ggplot2 package v.3.3.2 (ref. 61). Where necessary, data were formally tested for normality and equal variances. Shannon diversity and richness were calculated for each sample using the vegan package v.2.5-7 (https://cran.r-project.org/web/packages/vegan/index.html). Both Shannon diversity and relative abundance data were modelled using LOESS (locally weighted estimated scatterplot smoothing) regression and plotted with 95% confidence intervals (CIs). All permutation tests (that is, PERMANOVA) were conducted with 10,000 permutations.

Determining PGCTs

DMM was used to cluster samples on the basis of microbial community structure62 and to determine the PGCTs for all samples. Five PGCTs were found to be optimal on the basis of the lowest Laplace approximation score. PGCTs were manually ordered from the youngest (PGCT-1) to the oldest (PGCT-5), based on the average DOL of samples within each PGCT.

The linear discriminant analysis effect size method63 was used to determine the bacterial species and EC numbers that discriminated each cluster using MicrobiomeAnalyst64,65.

PERMANOVA

To determine which co-variates were associated with metagenome profiles while accounting for repeated measures, multiple cross-sectional analyses using the ‘adonis’ function from the vegan package were performed. Data were split into nine specific time windows based on DOL, which were chosen to both maximize the number of samples within each window and reflect the progression of enteral feed independence as follows: establishing enteral feeds (0–9), reaching full feeds (10–14), independent of parenteral nutrition (15–19) and maturation on full enteral feeds (20–24, 25–29, 30–34, 35–39, 40–49, 50–69). For the univariate analysis, too few infants/samples were available beyond this timepoint. Only a single sample per infant, the earliest available, was included within each time window. The association of 12 clinical variables (defined in the cohort section) on the metagenome and functional profiles was tested, based on Bray–Curtis dissimilarity. Each test was performed in a stepwise manner and subsequent P values were adjusted for multiple comparisons using FDR adjustment (Benjamini–Hochberg procedure66).

To assess whether there was a statistically significant difference in serum and stool metabolite profiles based on PGCT assignment, PERMANOVA was performed in MetaboAnalyst v.5.0 (ref. 67).

Ordination

For metagenomic and RNA-seq analysis, ordinations were performed on all data using non-metric multidimensional scaling (NMDS). NMDS plots were based on Bray–Curtis dissimilarity matrices for both taxonomic and functional data and Euclidean distance on regularized logarithm (rlog), transformed, normalized RNA-seq count data, using the ‘metaMDS’ function from the vegan package. The mean centroid for each group was calculated and plotted.

For metabolite analysis of stool and serum samples, ordinations were performed on all data using partial least-squares discriminant analysis using MetaboAnalyst v.5.0 (ref. 67).

LMMs and GLMMs

Various linear mixed models (LMMs) and generalized LMMs (GLMMs) were fit to the data using the glmmTMB package v.1.0.2.1 (ref. 68) or, alternatively, the logistf package v.1.24 (https://cran.r-project.org/web/packages/logistf/index.html), which was used to fit logistic regressions using Firth’s bias-reduced penalized likelihood, when there was quasi-complete or complete separation. To detect separation and infinite maximum likelihood estimates in binomial logistic regression models, the ‘detect_separation’ and ‘check_infinite_estimates’ functions from the brglm2 package v.0.7.1 (https://cran.r-project.org/web/packages/brglm2/index.html) were used. Model validity was assessed using diagnostic residual plots, generated by the DHARMa package v.0.3.3.0 (https://cran.r-project.org/web/packages/DHARMa/index.html). Diagnostic residual plots were not generated for models fit by logistf. The general formula for each of the LMMs fitted was as follows:

$$Y\approx X_1 + X_2 + \ldots + X_n + (1|}).$$

To find out which co-variates were significantly associated with Shannon diversity, mixed-effects models were fit using the Gaussian distribution. All 12 co-variates included in the ‘adonis’ analysis plus DOL were included as fixed effects and subject ID was included as a random group intercept.

To determine which co-variates were significantly associated with the five PGCTs, individual mixed-effects binomial logistic regression models were fit, one for each cluster versus all other clusters. Each model contained the same 12 co-variates plus DOL as fixed effects and subject ID as a random group intercept. Mixed-effects binomial logistic regression models were also fit to assess the prevalence of probiotic species. DOL had an effect on the relative abundance of probiotic species, so the before, during and after probiotic groups are nested in time. To account for this, the control group of samples from infants who had taken no probiotic was subset into three distinct time bins. These specific time bins were based on the mean start DOL for probiotics (8 DOL) and the mean stop DOL for probiotics (44 DOL). Mixed-effects binomial logistic regressions were fit separately within groups (before, during and after probiotics) for each probiotic species. They were also fit separately between groups (Infloran and Labinic) for each species.

Where used, global P values for fixed effects from the final models were obtained by analysis of variance (type II Wald’s χ2 test) from the car package69 v.3.0-10. All post-hoc analysis was performed using either pairwise comparisons (Tukey’s highly significant difference (HSD) method) or treatment versus control comparisons (Dunnett’s test), both adjusting for multiple comparisons, using the emmeans package v.1.5.4 (https://cran.r-project.org/web/packages/emmeans/index.html).

Analysis of probiotic ‘persisters’ to determine significant co-variates

Persistence analysis included all infants receiving probiotics that had at least two samples at least 7 d after probiotics were stopped, and included an additional 22 samples taken after 120 DOL. Persistence of B. bifidum and L. acidophilus was assessed, because these two species were found in both Infloran and Labinic, allowing for a larger sample size (n = 52). Infants were classed as ‘non-persister’ for a species if there were two consecutive samples with a relative abundance of 0. This criterion was found to be optimal and babies could be separated quite clearly into ‘persisters’ and ‘non-persisters’. Binomial logistic regressions were fit for each probiotic species to determine which co-variates were significantly associated with persistence, using the logistf package as previously described. The models both included the subject-level co-variates as described.

MaAsLin analysis to determine significant taxa and EC numbers associated with each co-variate

The MaAsLin2 package v.1.2.0 (ref. 70) was used to determine significant taxa and EC numbers associated with co-variates, while adjusting for potential confounders. MaAslin2 was run on both genus- and species-level relative abundance data and EC number relative abundance data. All co-variates used in the ‘adonis’ analysis plus DOL were included as fixed effects in the analysis and subject ID was included as a random effect. The arcsin square root transformation was performed on relative abundance data and default MaAsLin2 parameters were used. All P values were adjusted by MaAsLin2 for multiple comparisons using FDR adjustment (Benjamini–Hochberg procedure) and the default q-value cut-off of 0.25 was used to identify significant results.

Analysis of B. infantis HMO gene clusters

B. infantis HMO genes (as previously described27) were quantified by first identifying the corresponding UniRef90 gene families and then utilizing B. longum-stratified gene quantifications (quantifying UniRef90 gene families) from HUMAnN v.2 (ref. 71). Samples with >90% of the genes in these six genomic loci (H1, H2, H3, H4, H5 and a urease gene cluster) were classed as having B. infantis.

Significance analysis of microarrays and metabolites, GO and enrichment analysis

Significance analysis of microarray (SAM) was used in MetaboAnalyst67 with a Delta threshold of 1.0 to identify specific metabolites discriminating PGCT-3 from PGCT-4/-5 and vice versa in both stool and serum.

GO and enrichment analysis

GO and enrichment analysis were performed using the gprofiler2 package v.0.2.1 (ref. 72), with default parameters and a customized genetic background. The top 25 most significant GO biological processes for PGCT-4 and PGCT-5 were reported.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

留言 (0)

沒有登入
gif