Identification of hub genes and establishment of a diagnostic model in tuberculosis infection

Differential expression analysis of TB

TB microarray data (GSE 83456, control: 61, TB: 92) were downloaded from the GEO database. The limma package was utilized to perform differential expression analysis on these 153 samples, and 159 DEGs were ascertained (|logFC|> 1 and FDR < 0.05) (Fig. 1). Among these DEGs, 149 were significantly downregulated, and 10 were significantly upregulated.

Fig. 1figure 1

Volcano plot of mRNA differential expression analysis between the normal and TB groups. Red represents significantly upregulated genes, and green represents significantly downregulated genes

GO and KEGG enrichment analyses of TB DEGs

Analyses were performed on DEGs with p-values < 0.05, with GO results showing that in the BP module, DEGs were basically enriched in response to virus, regulation of viral process, and defense response to symbiont GO terms. In CC module, they were basically enriched in endocytic vesicle and specific granule GO terms. In MF module, they were basically enriched in GTP binding and guanyl nucleotide binding GO terms, mainly involving virus defense and symbiont defense-related functions (Fig. 2A). KEGG analysis revealed that the majority of these genes exhibited significant enrichment in key pathways such as the NOD-like receptor signaling pathway, innate immune response, and diseases associated with coronaviruses (Fig. 2B).

Fig. 2figure 2

GO and KEGG enrichment analyses of TB DEGs. A GO enrichment analysis results; B KEGG enrichment analysis results. The size of the bubble represents the number of enriched genes, and the color represents the p-value. The darker the color, the more significant the p-value

Identification of hub genes in TB

STRING database was utilized to build a PPI network of the 159 DEGs, and 139 nodes and 436 edges were obtained with a confidence score > 0.7 (Fig. 3A). Based on the PPI network, the importance coefficients of the DEGs were calculated using five algorithms (EPC, MCC, MNC, Radiality, and Stress) in the Cytoscape software with the cytoHubba plugin, and the top 20 hub genes were selected. Venn diagram was utilized to identify 13 hub genes that were commonly included in all five algorithms (DDX58, IFIT2, IFIH1, RSAD2, IFI44L, OAS2, OAS1, OASL, IFIT1, IFIT3, MX1, STAT1, and ISG15) (Fig. 3B).

Fig. 3figure 3

Identification of hub genes in TB. A PPI network of the DEGs in the STRING database. Each node represents a protein, and the spiral structure inside the node represents the known three-dimensional structure of the protein. The color of the node represents the score value of the interaction. The lines between nodes represent the interactions between two proteins, and different colors correspond to different interaction types. B Upset plot of the hub genes identified using the five algorithms. The x-axis represents the five algorithms, and the y-axis represents the number of genes commonly identified by the five algorithms

Co-expression network and related functions of the hub genes

GeneMANIA database was utilized to dissect the co-expression network and related functions of the 13 hub genes (Fig. 4). These genes exhibited a complex co-expression network with a co-expression rate of 81.55%, a physical interaction rate of 12.27%, a co-localization rate of 2.10%, and a predicted rate of 2.24%. They were mainly concentrated in functions related to viral regulation and immune modulation, such as regulation of viral life cycle, regulation of viral process, regulation of symbiotic process, viral life cycle, cellular response to interferon-gamma, response to interferon-gamma, and regulation of type I interferon production.

Fig. 4figure 4

Co-expression network of the 13 hub genes in TB constructed using the GeneMANIA database. The inner layer represents the hub genes, and the outer layer represents the co-expressed genes. The size of the node represents the strength of the correlation, and the color of the line represents the type of interaction

Clustering and immune infiltration analysis of hub genes

Consensus clustering was manipulated on TB patients based on the expression data of the 13 hub genes. The optimal number of clusters was determined to be 2 grounding on the consensus CDF plot and delta area plot, and a K-means clustering plot was generated (Fig. 5A–C). The immune infiltration levels of each TB patient sample were evaluated using the ssGSEA method. Differences in immune infiltration levels between patients in different subgroups were analyzed, and it was found that the immune cell infiltration levels of aDCs, macrophages, and Th1-cells were higher in subgroup 2 than in subgroup 1 (P < 0.05). The immune function scores of APC_coinhibition, HLA, inflammation-promoting, parainflammation, T_cell_co-inhibition, Type_I_IFN_Reponse, and Type_II_IFN_Reponse were tellingly higher in subgroup 2 (P < 0.05) (Fig. 5D, E). These results indicated that TB patients in subgroup 2 have a better response to immune therapy.

Fig. 5figure 5

Clustering and immune infiltration analysis of hub genes. A Consensus cumulative distribution function (CDF) plot. The different colored curves represent the CDF for different values of k, and the CDF reaches an approximate maximum when k is 2. B Relative change in the area under the CDF curve. Each point represents the total area under the CDF curve for a specific value of k, and the appropriate value of k is 2. C K-means clustering plot. The rows and columns of the matrix represent the samples, and the consistency matrix is arranged according to the dendrogram above the heatmap, with the bars between the dendrogram and the heatmap indicating the clusters. D Analysis of differences in immune cell components in the TB infection subgroups. E Analysis of differences in immune function components in the TB infection subgroups

Screening of diagnostic biomarkers and model validation

LASSO analysis was performed on the 13 hub genes to screen for 8 key genes, which were IFIT1, IFIT2, IFIT3, IFIH1, RSAD2, OAS1, OAS2, and STAT1. A diagnostic model was constructed from these genes, with the index calculated as follows: index = 0.567 × IFIT3 + 0.0182 × RSAD2-0.1405 × OAS2-1.1373 × IFIT1 + 2.4298 × STAT1 + 0.9271 × OAS1 + 0.0542 × IFIT2 + 18721 × IFIH1.

To validate the accuracy of the model, we depicted an ROC curve using data from training set GSE83456 and calculated the AUC value of the diagnostic model, which was found to be 0.973 (Fig. 6C). The diagnostic ability of the key genes was further verified in the validation set GSE19444, and the diagnostic efficiency of the test set was ascertained to be 0.901 (Fig. 6A–D). The findings strongly indicated that the diagnostic model, constructed using immune-related key genes for TB, exhibited a robust diagnostic performance.

Fig. 6figure 6

Screening of diagnostic biomarkers and model validation. A Coefficient distribution plot generated for the logarithmic sequence (λ) in the LASSO model. B LASSO coefficient spectrum of the LASSO Cox analysis. C ROC curve analysis of the training set GSE83456. D ROC curve analysis of the validation set GSE19444

留言 (0)

沒有登入
gif