To explore the changes in cellular composition in IPF lung tissue, we collected and analyzed single-cell sequencing data from normal and IPF lung tissues. After quality control and batch effect correction, we obtained 66,500 qualified cells expressing 25,765 genes. Through unsupervised dimensionality reduction and clustering algorithms, we annotated and integrated cell types. Ultimately, we identified 16 cell clusters and visualized them using t-SNE (Fig. 1C), including monocytes (CD14 and CD16), mast cells (MS4A2 and CPA3), endothelial cells (VWF), macrophages (MACRO and MRC1), NK cells (KLRD1 and CD3E), ciliated cells (FOXJ1), fibroblasts (COL1A1), myofibroblasts (COL1A1, ACTA2, and PDGFRA), type II alveolar epithelial cells (SFTPB and SFTPC), dendritic cells (MHC II), AT-1 cells (AGER), mesenchymal stem cells (CD44 and ENG), goblet cells (MUC5B), plasma cells (CD79A), T cells (CD8 and CD4), and club cells (CYP2F2). Next, we investigated the changes in cellular composition in IPF lung tissue compared to normal. We found that monocytes and macrophages increased, while fibroblasts, myofibroblasts, and AT1 were significantly reduced (Fig. 1E, Supplementary Fig. 1). Furthermore, we analyzed differentially expressed genes for each cell type (Fig. 1D), with the top 3 upregulated and downregulated genes in macrophages being C1QA/C1QB/AP0C1 and MGP/SCGB3A1/SFPTC, respectively.
Fig. 1Changes in the Proportion of Macrophage Subtypes in Idiopathic Pulmonary Fibrosis (IPF) Lung. A, B t-SNE images showing cell distribution in normal and idiopathic fibrotic lung tissue. C Unbiased clustering divides cells into 16 cell clusters, and each cluster is distinguished by different coloring. D LogFC visualization of the top three genes for each cell type in normal and idiopathic fibrosis lung tissue after differential analysis. E Proportion of each cell type in normal and idiopathic fibrotic lung tissue
Single-cell analysis reveals macrophage heterogeneity in IPF lung tissuesTo further explore specific changes in macrophages, we extracted macrophages and further classified them into 13 subclusters based on macrophage-specific highly variable genes (Fig. 2A, B). We observed that Clusters 0, 2, 7, 8, 10, and 12 were increased in IPF lung tissues compared to normal lung tissues, with Cluster 0 identified as IPF-related macrophages (IPF-MΦ). In contrast, Clusters 1, 3, 4, 5, 9, and 13 were decreased, and interestingly, Cluster 11 was exclusively present in IPF lung tissues. We conducted a separate analysis of Cluster 11 (Supplementary Fig. 2). In this cluster, we found high expression of the ATP5 gene family, including ATP5F1E, ATP5MG, ATP5MC3, ATP5MC2, SMIM25, ATP5MPL, ATP5MD, ATP5MF, ATP5F1D, etc. (Supplementary Table 1). We confirmed through the HUGO Gene Nomenclature Committee (HGNC, https://www.genenames.org/) database that these genes belong to the ATP synthase subunit genome [36]. Based on the expression profile of this genome, Cluster 11 was named ATP5- MΦ.We analyzed the differentially expressed genes in ATP5-MΦ and generated a volcano plot (Supplementary Fig. 2A). Next, we performed enrichment analysis on these differentially expressed genes. GO enrichment analysis revealed (refer to Supplementary Fig. 2B) that these differentially expressed genes were mainly involved in biological processes such as proton transmembrane transport, proton-transporting two-sector ATPase complex, and proton transmembrane transporter activity. GSEA analysis showed enrichment of upregulated oxidative phosphorylation and energy production gene sets in the entire gene set (Supplementary Fig. 2C), while the IL-17 signaling pathway gene set was downregulated.
Fig. 2Single-cell analysis reveals the heterogeneity of macrophages in idiopathic pulmonary fibrosis (IPF). A t-SNE plot displaying the distribution of 13 subtypes of macrophages in normal and IPF lung tissues. B Proportions of macrophage subtypes in IPF compared to normal tissues. C Left panel: Dynamic patterns of representative differentially expressed genes (DEGs) in each macrophage cluster are illustrated in this series of graphs. Middle panel: Heatmap depicting the representative DEGs between each cell cluster. Right panel: Representative enriched gene ontology (GO) terms for each cluster. D Heatmap displaying the gene changes in each macrophage subtype. E Representative gene set enrichment analysis (GSEA) pathways in IPF-associated macrophages, including pathways related to cofactor biosynthesis, p53 signaling, oxidative phosphorylation, and tyrosine metabolism
To explore the gene expression data of each macrophage subtype at the single-cell level, we calculated the relative expression levels of each gene in each individual cell and assigned a score to each gene. Subsequently, we performed unsupervised clustering of these genes, forming different gene clusters (Fig. 2C). We grouped genes with similar expression trends, obtaining a consensus of 13 different expression trends related to various biological functions, as shown in the clustering results in Fig. 2C. We observed that genes in Cluster 1 of IPF-MΦ exhibited significantly high expression levels, and these genes were highly enriched in biological functions such as Th1 and Th2 cell differentiation, PPAR signaling pathway, pertussis, complement, and coagulation cascades, renin-angiotensin system, and regulation of fat breakdown in adipocytes, among others. The differential gene heatmap of the 13 macrophage subclusters is displayed in Fig. 2D, where FOXM1 and MYC were highly expressed in IPF-MΦ, and the expression levels of Cluster 8, 9, 10, and 11 differential genes were similar. Through GSEA analysis (Fig. 2E), we found an upregulated enrichment of accessory factor synthesis and p53 signaling pathway gene sets in IPF-MΦ, while oxidative phosphorylation and tyrosine metabolism gene sets showed a downregulated enrichment.
Pseudo-time trajectory analysis and cell communication analysis of IPF-associated macrophagesWe delved into the developmental trajectory of macrophages and the communication network between macrophages and other cell types in IPF through pseudo-time trajectory analysis and CellChat analysis. Pseudo-time trajectory analysis revealed the unique position of IPF macrophages on the developmental trajectory (Fig. 3A). The formation of IPF macrophage cluster 10 was observed in the initial stages of the trajectory. IPF-MΦ was discovered at trajectory branches 3, 4, and 5, suggesting a specific developmental process for IPF macrophages. Subsequently, CellChat analysis was employed to understand further the intricate communication network between IPF-MΦ and surrounding cell clusters. The results indicated that IPF-MΦ exhibited outstanding communication capabilities with dense interactions with various cell types (Fig. 3B). Particularly noteworthy was the intense interaction of IPF-MΦ with other macrophages through various ligand receptors, such as GRN-SORT1 and MIF-(CD74 + CD44) (Fig. 3C). The strength of both incoming and outgoing interactions of IPF-MΦ was higher compared to other macrophages (Fig. 3D). Signal pathways like MIF, ANNEXIN, GALECTIN, and VISFATIN showed similar patterns in both incoming and outgoing signals, representing the utilization of similar signaling transduction pathways for incoming or outgoing signals. The pathways with the highest incoming and outgoing signal strengths for IPF-MΦ were MK and cxcl signaling pathways.
Fig. 3Pseudo-temporal trajectory and intercellular communication analysis of IPF-MΦ. A Pseudo-temporal trajectory analysis illustrates branching points and directions of macrophage differentiation in idiopathic pulmonary fibrosis (IPF). B CellChat analysis-based visualization of communication patterns among all cell types, including the number and strength of interactions. C Probability of ligand-receptor-mediated communication between macrophage subtypes and other cell populations. D Heatmap displaying differences in the strength of incoming and outgoing interactions between each cell type. E Signaling network based on differential analysis, with arrow direction indicating the direction of cell communication. F Bar chart quantifying Inositol phosphate metabolism and Nitrogen metabolism in macrophages. G t-SNE visualization depicting the levels of glycolysis and gluconeogenesis in macrophages
Further differential analysis revealed that, compared to other macrophages, IPF-MΦ exhibited significantly enhanced communication with myofibroblasts, ciliated cells, AT2, epithelial cells, and fibroblasts (Fig. 3E). This enhanced communication might be closely related to the emergence of the ANXA1-FPR2 and NAMPT-(ITGA5 + ITGB1) signaling pathways in IPF-associated macrophages (Fig. 3C). Activating these signaling pathways might play a crucial role in the pathogenesis of IPF, influencing interactions and communication between different cell types thereby affecting inflammation, immune responses, and potential fibrotic processes. Additionally, we analyzed the metabolic levels of macrophage subtypes (Fig. 3F, G). Compared to other macrophage subtypes, Cluster 9 and 11 showed lower levels of nitrogen and inositol phosphate metabolism, while Cluster 10 exhibited lower levels of glycolysis and gluconeogenesis. These differences in metabolic levels may be related to the functional distinctions among macrophage subtypes in IPF.
hdWGCNA reveals module hub genes in IPF-MΦTo unravel the potential functions of the IPF macrophage subtype, we employed high-dimensional weighted gene co-expression network analysis (hdWGCNA). We chose a power of 8 to construct a scale-free network, resulting in 12 gene modules (Supplementary Fig. 3A, Fig. 4A-C). Interestingly, the purple, green, and yellow modules exhibited substantial expression in cluster 2 and cluster 5 macrophages, while the blue, magenta, and yellow-green modules were highly expressed in IPF-MΦ and cluster 2 macrophages. We visualized each module's protein–protein interaction (PPI) networks based on the top 10 hub genes within the 12 gene modules (Supplementary Fig. 3B, Fig. 4D).
Fig. 4hdWGCNA Identifies Module Hub Genes for IPF-MΦ. A The average connectivity plot displays the scale-free topology fitting index and soft-thresholding ability (optimal soft-threshold value is 8). B Estimation of module activity in different macrophage clusters using the hdWGCNA algorithm. C t-SNE plot illustrating the expression distribution of each module hub gene across 13 macrophage clusters. D For each module detected by WGCNA, a protein–protein interaction (PPI) network was constructed by extracting the top 10 genes from each module. E Gene overlaps within different modules. F Presentation of the module-to-module relationship matrix based on the correlation of module hub genes. G Brown, red, and tan modules are gene modules for IPF-MΦ, and the top 10 hub genes are presented according to the hdWGCNA process. * p < 0.05, ** p < 0.01, *** p < 0.001
Further exploration of the inter-module relationships (Fig. 4F) revealed a robust positive correlation between the red module and the blue and green-yellow modules. Moreover, the brown, red, and tan modules exhibited a preference for expression in IPF-MΦ and displayed significant correlations (Figs. 4B and 5E). We identified the genes within the brown, red, and tan modules as characteristic genes of IPF-MΦ and constructed PPI networks using the hub genes from each module (Supplementary Fig. 3C).
Fig. 5Various machine learning methods identify key genes associated with macrophages and fibrosis progression. A,B LASSO regression was used to screen and identify 26 key genes. C Random forest ranks all genes to determine their importance in the model. D Venn diagram filtering reveals overlapping genes from the three algorithms, including FAM174B, PMP22, ATF4, DLD, ELOB, CTDP1, SV2B, USP10, and PHACTR1. E The support vector machine recursive feature elimination (SVM-RFE) method was used to evaluate all genes and the genes were ranked based on the average ranking
Various machine learning algorithms reveal signature genes for IPF-MΦWe constructed predictive models using the hub genes from the brown, red, and tan modules to further explore the relationship between hub genes of IPF-MΦ gene modules and the onset and progression of IPF. Employing various machine learning methods to reduce the false-positive rate of screening results, we conducted a detailed analysis of the RNA dataset (GSE32537). Initially, through the LASSO regression algorithm, we successfully identified 26 key genes closely associated with the prognosis of IPF patients (Fig. 5A, B). Subsequently, these genes were ranked using random forest analysis (Fig. 5C), and the top 30 genes in importance were extracted. Using SVM-RFE methodology, we evaluated all genes through tenfold cross-validation and obtained the top 30 important genes based on their average rankings (Fig. 5E). Finally, we generated a Venn diagram to identify overlapping genes from the three machine learning methods (Fig. 5D), resulting in nine significant genes, including FAM174B, PMP22, ATF4, DLD, ELOB, CTDP1, SV2B, USP10, and PHACTR1, all closely associated with the prognosis of IPF patients.
Relationship between IRMG and the immune microenvironmentTo elucidate the relationship between the gene characteristics of IPF-MΦ and the immune microenvironment, we employed CDF curve analysis to determine the effectiveness of this algorithm in patient grouping (Fig. 6A). The apparent inflection points on the curve at different values suggested optimal algorithm performance when patients were divided into two groups. Subsequently, for a more comprehensive understanding of the biological characteristics of these two subtypes and their roles in the development of IPF, we utilized the differentially expressed IPF-MΦ characteristic genes (IRMG) and the NMF algorithm to cluster IPF patients into two subtypes (see Fig. 6B). We then assessed IRMG scores and the Diffusion Capacity of the Lungs for Carbon Monoxide (DLCO) index for the two subtypes (see Fig. 6C, D). It was found that patients with type 2 had significantly higher IRMG scores and significantly lower DLCO indices compared to patients with type 1, which is a measure of lung function, so patients with type 2 had poorer lung function. Additionally, we analyzed the immune microenvironment of the two subtypes (see Fig. 6E-G). In comparison to Type 1 IPF patients, Type 2 IPF patients showed a significant increase in the proportions of macrophages, γδ T cells, plasmacytoid dendritic cells, central memory CD8 T cells, type 17 T cells, effector memory CD4 T cells, and monocytes. Conversely, the proportions of activated B cells, activated CD4 T cells, and activated CD8 T cells were significantly elevated. These findings suggest distinct immune response patterns between Type 1 and Type 2 IPF patients, with Type 2 IPF patients potentially exhibiting a more activated and inflammatory immune state.
Fig. 6Immunoinfiltration analysis. A Cumulative Distribution Function (CDF) curve assesses the metric indicating differences or improvements in performance between two models or methods. B The unsupervised NMF algorithm divides IPF patients into two groups. C Violin plots display the differences in IPF-MΦ characteristic gene (IFMG) scores between the two groups, as well as (D) the Diffusion Capacity of the Lungs for Carbon Monoxide (%DLCO). F Stacked bar charts depict the composition of immune cells in different subtypes. E Heatmap and (G) bar charts show the relative proportions of different immune cell types in high-risk and low-risk patients. H Heatmap displays the correlation between IFMG and immune cells. I Kaplan–Meier curves for high-risk and low-risk patients. * p < 0.05, ** p < 0.01, *** p < 0.001
Furthermore, these nine genes exhibited a strong correlation with immune cells. Notably, different genes showed varying correlations with immune cells, and even opposite results were observed. For example, FAM174B exhibited a significant negative correlation with most immune cells, while UPP1 showed a significant positive correlation with most immune cells. We posit that this may be related to these genes’ different roles in various immune regulatory pathways.
Machine learning to build a predictive modelBased on the identified IFMG through machine learning, we constructed a predictive model for predicting the onset and progression of IPF in patients. We employed seven machine learning algorithms from the mlr3 package to find the most optimal algorithm. When evaluating the performance of these models, using the Area Under the Curve (AUC) as the metric, the Support Vector Machine (SVM) machine learning algorithm exhibited the highest accuracy, sensitivity, and specificity (Fig. 7A, B). Subsequently, we validated the performance of the predictive model using the GSE110147 and GSE70866 datasets, yielding AUC values of 0.893 (Fig. 7C) and 0.851 (Fig. 7D), indicating that the model can effectively distinguish between high-risk and low-risk IPF patients. This series of work has yielded more effective clinical diagnostic markers and provided important clues for future diagnostic and therapeutic research.
Fig. 7Machine learning to build a predictive model. A Construction of predictive models using seven machine learning algorithms. B Average AUC values from 10 repetitions of 5-fold cross-validation for each model. C ROC curves of the model in the independent external validation set GSE110147. D ROC curves of the model in the independent external validation set GSE70866
FAM174B, PHACTR1, DLD and ATF4 identified as genes influencing the prognosis of IPF patientsTo confirm the association of these nine genes with the prognosis of IPF patients, we subjected these significant genes to univariate Cox regression analysis and Kaplan–Meier (KM) survival analysis. The Cox regression analysis results (Fig. 8A) indicated that FAM174B, PMP22, ATF4, DLD, ELOB, SV2B, USP10, and PHACTR1 were associated with a higher risk of prognosis, while CTDP1 was linked to a lower risk. In KM survival analysis, we observed that DLD, PHACTR1, PMP22, ATF4, FAM174B, and USP10 were correlated with patient prognosis (Fig. 8B-G). Finally, ROC curves illustrated (Fig. 8H-K) that PHACTR1 (AUC = 0.9021, P = 0.0009), ATF4 (AUC = 0.7622, P = 0.0298), FAM174B (AUC = 0.7762, P = 0.0221) and DLD (AUC = 0.8741, P = 0.0019),might serve as valuable biomarkers.
Fig. 8Further validation of genes influencing the prognosis of IPF patients. A Screening genes for inclusion in prognostic analysis by univariate COX regression. B-G Kaplan–Meier survival analysis was used to evaluate the relationship between the expression of DLD, PHACTR1, PMP22, ATF4, FAM174B, and USP10 genes and patient survival time. ROC curves for (H) PHACTR1, (I) ATF4, (J) FAM174B, and (K) DLD external data set validation
留言 (0)