Identification reproducible microbiota biomarkers for the diagnosis of cirrhosis and hepatocellular carcinoma

Clinical characteristics of the patients and healthy individuals

As shown in Table 2, except for gender and BMI, other clinical characteristics of the participants in Fuzhou cohort were significantly different among disease states. In addition, age, TP, AST, and AFP were also significantly different between the Jilin and Fuzhou cohort (Additional file 1: Table S2). These results indicated that the HCC diagnostic biomarkers derived from these data ought to be independent of clinical characteristics.

Table 2 Clinical characteristics of 82 samples collected in this studyMicrobial diversity differences

Firstly, we compared the microbial diversity of samples at various stages of liver disease with HC. The Shannon index, Simpson index, Chao1 index and ACE index of alpha diversity were calculated, respectively. The Kruskal–Wallis test showed that only the Shannon diversity in Fuzhou HCC samples were significantly higher than that in HC, and the Shannon, Chao1 and ACE diversity in Xiamen LC samples were significantly lower than that in HC (Kruskal–Wallis test, p < 0.05, Fig. 1a and Additional file 1: Tables S3, S4, S5, S6, S7). Notably, in three datasets with multiple disease stages, only the microbial diversity in the Xiamen samples was significantly decreased with disease progression.

Fig. 1figure 1

Microbial diversity differences between different groups. a Alpha diversity measured by the Shannon index, Simpson index, Chao1 index and ACE index. *: p < 0.05. b PCoA of beta diversity based on Bray–Curtis distance for five datasets

Beta diversity was calculated using Bray–Curtis distance, and PCoA analysis showed that the compositions of individual microbial community structure among CHB, LC, HCC and HC were significantly different in Fuzhou, Jilin and Xiamen samples (Fig. 1b). The PERMANOVA results showed that disease stage (LC and HCC) exerted significant influences on the communities (Table 3), while CHB did not. Significant differences of beta diversity between CHB and HC were only observed in Fuzhou and Jilin samples but not in the Xiamen and Shanghai samples. The results indicated that the composition of the microbial community changed greatly in LC and HCC.

Table 3 PERMANOVA test results of beta diversity based on Bray–Curtis distance

Moreover, all samples from five datasets were pooled together for PCoA analysis to evaluate the biological variations and technical differences in different datasets. As shown in Additional file 1: Fig. S1, samples tended to cluster together by different studies rather than by different disease states. These results indicated that the heterogeneity between datasets was greater than the difference between different disease states. Therefore, different datasets were analyzed separately in the subsequent analysis.

Alterations in microbial composition

In order to understand the specific changes of gut microbiota in different disease stages, we firstly analyzed the composition of gut microbiota at the phylum and genus levels. At the phylum level, Firmicutes and Bacteroidetes were the main dominant bacteria in HC, CHB, LC and HCC, followed by Proteobacteria and Actinobacteria (Fig. 2a). The relative abundances of Firmicutes in LC and HCC were significantly decreased compared to that in HC, and significantly decreased as disease progressed, while the relative abundance of Bacteroides was significantly increased (Wilcoxon rank-sum test, p < 0.05, Fig. 2b, Additional file 1: Fig S2a). Previous studies have shown that the ratio of Bacteroidetes/Firmicutes (B/F) is related to the development of inflammatory diseases, and the increase of the ratio can promote the development of inflammation (Kabeerdoss et al. 2015, Stojanov et al. 2020, Walker et al. 2011). The result indicated that patients with LC and HCC may be accompanied with more inflammatory responses. In addition, the relative abundance of Proteobacteria was also significantly increased in LC and HCC patients, suggesting that a high proportion of Bacteroides/Firmicutes and a high abundance of Proteobacteria may jointly contribute to the progression of HBV-induced liver disease (Fig. 2b).

Fig. 2figure 2

Distribution of the predominant bacteria at the phylum and genus levels in five datasets. a Stacked bars of the microbial composition at the phylum level among HC, CHB, LC and HCC. b Bar chart of the relative abundance of predominant taxa at the phylum levels in LC and HCC compare to HC. Wilcoxon rank sum test was used to compare the difference. *: p < 0.05, **: p < 0.01, ***: p < 0.001. c Stacked bars of the microbial composition at the genus level among HC, CHB, LC and HCC

At the genus level, the main bacteria composition were Bacteroides, Faecalibacterium, Prevotella 9, Escherochia/Shigella, Erysipelotrichaceae UCG-003 and Lachnoclostridium (Fig. 2c). Compared with HC, 83, 142 and 60 differential genera were identified in Fuzhou, Jilin and Xiamen in LC samples, respectively (Wilcoxon rank-sum test, all p < 0.05, Fig. 3a), of which 14 genera were consistently dysregulated in at least two datasets, denoted as reproducible LC-associated microbial markers. Among the 14 genera, three genera (Akkermansia, Barnesiella and Bacteroides) were significantly increased in LC, while 11 genera (Blautia, Fusicatenibacter, Howardella, Lachnospiraceae ND3007 Group, Lachnospiraceae UCG-008, Marvinbryantia, Butyricicoccus, CAG-352, Dialister, Eggerthella, Ruminococcaceae UCG-013) were significantly decreased (p < 0.05, Fig. 3b). Similarly, 299, 188 and 43 genera with significant differences were identified between HCC and HC samples in Fuzhou, Jilin and Nanjing datasets (Wilcoxon rank-sum test, all p < 0.05, Fig. 3c), of which 10 genera were consistently dysregulated in at least two datasets, denoted as reproducible HCC-associated microbial markers. Among the 10 differential genera, six genera (Fluviicola, Veillonella, Cryomorphaceae__uncultured, Flavobacteriaceae__uncultured, NS9 Marine group__uncultured bacterium, Spongiibacteraceae BD1-7 clade) were significantly increased in HCC, while four genera (Lachnospiraceae UCG-008, CAG-352, Ruminiclostridium 5, uncultured Erysipelotrichaceae bacterium) were significantly decreased (p < 0.05, Fig. 3d).

Fig. 3figure 3

The significantly differential genera between LC or HCC and HC across datasets. ab UpSet plot and bubble plot of the significantly differential genera between LC and HC across datasets. cd UpSet plot and bubble plot of the significantly differential genera between HCC and HC across datasets. Red and green represented the direction of differential genera, the shape size represented the significant level. NA, not detected genera

In addition, the stepwise comparative analysis of CHB vs HC, LC vs CHB and HCC vs LC were also conducted, respectively. Compared with HC, 46, 130, 22 and 11 differential genera were identified in Fuzhou, Jilin, Xiamen and Nanjing CHB samples, respectively (Wilcoxon rank-sum test, all p < 0.05, Additional file 1: Fig S2b). Among them, Bacteroides was significantly increased in Fuzhou and Jilin datasets, while Phascolarctobacterium, Gordonibacter and DTU089 were significantly decreased in Jilin and Nanjing datasets. Compared with CHB, there were 43, 92 and 48 differential genera in Fuzhou, Jilin and Xiamen LC samples, respectively (Wilcoxon Rank-sum test, all p < 0.05, Additional file 1: Fig S2c), of which 8 genera were consistently dysregulated in at least two datasets. Among them, Bacteroides was also significantly increased in two datasets, while 7 genera were significantly decreased. Compared with LC, 174 and 216 differential genera were identified in Fuzhou and Jilin HCC samples, respectively (Wilcoxon rank-sum test, all p < 0.05, Additional file 1: Fig S2d). Only 5 genera (Ruminococcaceae UCG − 014, Akkermansia, Flavobacteriaceae__uncultured, Blautia and Eggerthella) showed a consistent dysregulated direction, of which Ruminococcaceae UCG − 014 and Akkermansia were significantly decreased.

Construction the diagnostic model for LC on reproducible differential genera

The following analysis was performed at the genus level. A RF classification model based on the 14 LC-associated genera was constructed to discriminate LC patients from HC. The Fuzhou samples were used as the training data and five-fold cross-validation was performed on a RF model with optimal parameter combination for mtry = 4 and ntree = 650. The AUC of the RF classifier model was 0.824 (95% CI 0.697–0.951, Fig. 4a) in Fuzhou samples. Then, the RF model achieved AUCs of 0.919 (95% CI 0.796–1.00, Fig. 4b) and 0.833 (95% CI 0.706–0.951, Fig. 4c) in Jilin and Xiamen samples, respectively. Moreover, AST to platelet ratio index (APRI), and FIB-4 are established as biomarkers for LC diagnosis in recent years, which were also applied in Fuzhou dataset with the same thresholds as previous studies (APRI: 1.5, FIB-4: 3.25) (Lurie et al. 2015; Xiao et al. 2015). The AUC values of APRI and FIB-4 for LC diagnosis were 0.72 and 0.51, respectively (Table 4), which were lower than the RF model based on 14 LC-associated genera. Collectively, these 14 LC-associated genera could be used as a potential microbial marker for LC diagnosis.

Fig. 4figure 4

The performances of two RF models based on 14 LC-associated genera or 10 HCC-associated genera. ac ROC curve of the RF model based on 14 LC-associated genera in Fuzhou, Jilin and Xiamen samples. d The heatmap of the relationships between 14 LC-associate genera and 13 clinical indicators. ef ROC curve of the RF model based on 10 HCC-associated genera in Fuzhou and Jilin samples. g The heatmap of the relationships between 10 HCC-associated microbial genera and 13 clinical indicators

Table 4 Performance of conventional diagnostic biomarkers

Correlation analysis between the above 14 common differential genera and 13 clinical factors in Fuzhou samples were performed. The results showed that 40 genera-factor pairs were significantly correlated, including 18 pairs with significantly positive correlation and 22 pairs with significantly negative correlation (Spearman, all p < 0.05, Fig. 4d). Among them, age, PT, AST, AKP, HDL and AFP were strongly correlated with the 14 LC-associated genera. In addition, Ruminococcaceae UCG-013 was significant positively correlated with TG, LDL, HDL, TP and PC, and negatively correlated with age, PT, AKP, AFP and TB. Bacteroides was negatively correlated with TG, LDL, HDL and TP, and positively correlated with age, PT, AKP, AST, AFP, TB and ALT. Interestingly, the correlation relationship of Ruminococcaceae UCG-013 and Bacteroides with clinical factors was opposite. Further correlation analysis showed that there was a marginally significant negative correlation between Ruminococcaceae UCG-013 and Bacteroides (Spearman, R = − 0.2, p = 0.071).

To enhance the diagnostic efficacy for LC, clinical factors that were significantly correlated with the 14 microbial markers in Fuzhou samples and commonly collected in Jilin samples were selected as candidate features, including age, AST and AFP. Single or multiple clinical factors were added into the 14 LC-associated genera to reconstruct a classification model. The results showed that the classification accuracy of the reconstructed model was greatly improved (Additional file 1: Fig. S3a–f). The similar results were observed in Jilin cohort, which achieved the highest AUC combined age and AST. The results suggest that clinical factors (age, AST and AFP) can greatly improve the discrimination efficiency of the 14 LC-associated genera.

Construction the diagnostic model for HCC on reproducible differential genera

Meanwhile, another RF classification model with optimal parameter combination for mtry = 9 and ntree = 200 by five-fold cross-validation was constructed based on the 10 HCC-associated genera to discriminate HCC from HC. The value of AUC in training Fuzhou samples was 0.902 (95% CI 0.794–1.00, Fig. 4e). Further, the model was validated in Jilin samples and achieved an AUC of 0.897 (95% CI 0.805–0.989, Fig. 4f). Validation was not performed in the Nanjing samples because only 4 of the 10 microbial markers were detected. Moreover, AFP is currently the most widely used biomarker for HCC diagnosis (Trevisani et al. 2001). As shown in Table 4, with the cut-off value of 10 ng/mL, the AUC values of AFP in differentiating HCC and HC were 0.76 in Fuzhou dataset and 0.89 in Jilin dataset, respectively, which were lower than the RF model based on 10 HCC-associated genera. These results indicated that the 10 HCC-associated genera could be used as potential microbial markers for HCC diagnosis. These results indicated that the classification efficiency of these 10 genera for HCC was better than the conventional diagnostic biomarker, and could be used as potential microbial markers for HCC diagnosis.

Correlation analysis between the above 10 genera and 13 clinical factors showed that 8 genera-clinical factor pairs were significant positively correlated and 12 genera-clinical factor pairs were significant negatively correlated (Spearman, all p < 0.05, Fig. 4g). Among them, Veillonella was significant positively correlated with age, PT, AST and AKP, and negatively correlated with TP, PC and TG. Ruminiclostridium 5 was negatively correlated with age, PT and AKP, and positively correlated with PC and TG. The correlation between the two genera and clinical factors was opposite. Correlation analysis also demonstrated that the relative abundance of Veillonella was significant negatively correlated with that of Ruminiclostridium 5 (Spearman, R = − 0.33, p = 0.0022).

Then single or multiple clinical factors, including age, AST and AFP, were combined with the 10 HCC-associated genera to reconstructed a model. The results showed that the classification accuracy was also greatly improved by the reconstructed model, which ranged from 0.921 to 0.990 (Additional file 1: Fig. S4a–f). The 10 microbial markers combined with AST and AFP achieved the highest AUCs in the two datasets (Additional file 1: Fig. S4f). These results indicated that clinical variables (age, AST and AFP) can greatly improve the ability of microbial markers to distinguish HCC patients.

Identification the microbial markers for early diagnosis of HCC

A multi-stage comparative analysis was performed in the 14 LC-associated genera and the 10 HCC-associated genera. In Fuzhou samples and Jilin samples, eight genera (Ruminococcaceae__CAG-352, Howardella, Lachnospiraceae UCG-008, Akkermansia, Eggerthella, Flavobacteriaceae__uncultured, NS9 Marine group__uncultured bacterium, uncultured Erysipelotrichaceae bacterium) were significantly different among multiple disease stages (Kruskal–Wallis test, p < 0.05). Among them, Ruminococcaceae__CAG-352 and Lachnospiraceae UCG-008 were shared by the LC-associated genera and the HCC-associated genera. In Fuzhou samples, the relative abundance of Ruminococcaceae__CAG-352 sharply decreased from HC to CHB, LC and HCC, and the relative abundance of Lachnospiraceae UCG-008 gradually decreased with the progression of disease (Fig. 5a). Howardella, Akkermansia and Eggerthella were unique in the LC-associated genera. The relative abundance of Akkermansia increased gradually in the precancerous stage of LC but decreased sharply in HCC, while the relative abundance of Eggerthella decreased gradually with the progression from HC to CHB and LC but increased significantly in HCC (Fig. 5b). Moreover, Flavobacteriaceae__uncultured, NS9 Marine group_uncultured bacterium and uncultured Erysipelotrichaceae bacterium were unique in the HCC-associated genera. The relative abundances of Flavobacteriaceae__uncultured and NS9 Marine group uncultured bacterium were very low in the precancerous samples, but increased sharply in Fuzhou HCC samples. The relative abundance of uncultured Erysipelotrichaceae bacterium was higher in HC, but significantly decreased or even disappeared in CHB, LC and HCC (Fig. 5c). The similar results were also observed in Jilin samples (Fig. 5d–f). These results suggested that the eight genera might play important roles in the progression from LC to HCC, which could be the potential microbial markers for the early diagnosis of HCC. Based on the above eight genera, a random forest classification model with optimal parameter combination for mtry = 6 and ntree = 2000 by five-fold cross-validation was constructed to distinguish HCC from LC by pooling Fuzhou and Jilin samples together. The model achieved an average AUC of 0.899 (95% CI 0.826–0.972, Fig. 5g), showing a good classification efficiency of HCC and LC.

Fig. 5figure 5

The alterations of microbial markers during disease progression. a Alterations of genera overlapped in the LC-associated genera and the HCC-associated genera in Fuzhou samples. b Alterations of LC-associated genera in Fuzhou samples. c Alterations of HCC-associated genera in Fuzhou samples. d Alterations of genera overlapped in the LC-associated genera and the HCC-associated genera in Jilin samples. e Alterations of LC-associated genera in Jilin samples. f Alterations of HCC-associated genera in Jilin samples. g ROC curve of the RF model based on eight genera for discriminating HCC and LC in the combined dataset of Fuzhou and Jilin samples

留言 (0)

沒有登入
gif