The metagenomic sequencing data yielded an average of 1.83 × 108 high-quality reads per sample, of which an average proportion 91.6% reads are from human host (average ~ 8.4x coverage of human genome) and subject to SNP calling (Fig. S1a). A set of 5,471,650 high quality SNPs were generated. Two of the 135 samples were outliers in principal component analysis and excluded from downstream analyses (Fig. S1b), likely due to their markedly higher missing rate of SNPs compared with the remaining samples (see Methods, Fig. S1c). Density plot showed an even distribution of the SNPs across 22 autosomes with no clear regional preferences (Fig. S1d). When plotting the number of SNPs as a function of the number of individuals sharing a SNP29, we observed a rapid decreasing trend of the number of SNPs with the increase of individuals sharing a SNP (Fig. S2), indicating a reasonable genetic variability across individuals in our cohort. The common variants (minor allele frequency > 0.05) were associated with lung function measurement (FEV1/FVC) both among all individuals (n = 133) and within COPD patients (n = 97) using a linear mixed model. Among all individuals, 60 SNPs were identified in association with FEV1/FVC involving 10 genomic loci mapped to 6 genes (P < 1 × 10−5, as suggestive significance, Table S2). Within COPD individuals, 73 SNPs were identified in association with FEV1/FVC involving 14 gene loci mapped to 5 genes (Fig. 2a, Table S2). Annotation of these 5 genes in existing functional databases (Open Targets Genetics, GWAS Catalog, and GWASATLAS) revealed two genes (AGPS and PTGDR) as genetically associated with lung function [29].
Fig. 2The relationships between host genetics and airway microbial diversity. a Manhattan plots of host genetic variants associated with lung function in COPD patients. The red line represents the P-value of 1.0 × 10−5. Two loci with the significance of genetic association above this threshold and previously identified to be COPD-associated were marked by their gene names (AGPS and PTGDR). b Barplots for the associations of the host genetics and other demographic and clinical variables with the airway microbiome in all the samples and within COPD or healthy individuals as assessed by PERMANOVA. The significant associations (P < 0.05) are highlighted in red. c, d Correlation of the first principal component of host genetic data (x-axis) with (c) microbial alpha diversity, and (d) the first principal coordinate of the airway microbiome beta diversity (y-axis). The green and red bars are histograms showing the distribution of the X and Y axis
We performed additional genetic association analyses using COPD/healthy and FEV1% predicted as traits. No SNPs were identified as associated with the binary trait of COPD/healthy (P < 1.0 × 10−5). Among all individuals, 292 SNPs were identified in association with FEV1% predicted involving 22 genomic loci mapped to 18 genes (Table S2, P < 1.0 × 10−5), with SNP rs16836069 mapped to gene CSMD2 reaching genome-wide significance (P < 5.0 × 10−8). Within COPD individuals, 51 SNPs were identified in association with FEV1% predicted involving 11 genomic loci mapped to 6 genes (P < 1.0 × 10−5). No SNPs reached genome-wide significance.
We also performed genetic association analyses with key demographic and clinical parameters, including inflammatory endotype features (sputum neutrophil, eosinophil percentages), severity (Global Initiative for Chronic Obstructive Lung Disease [GOLD] status), and inhaled corticosteroid (ICS) usage and smoking status (Table S2). For neutrophil, 1308 SNPs involving 85 loci mapped to 43 genes were identified (P < 1 × 10−5), where rs74555247 (mapped to SFMBT1) and rs1003669 (mapped to OXR1) reached genome-wide significance (P < 5 × 10−8). For eosinophil, 1550 SNPs involving 114 loci mapped to 70 genes were identified (P < 1 × 10−5), where rs75059289, rs115876665, rs2277122 (mapped to ABCC10), rs356041 (mapped to PITPNM3), rs75177701 (mapped to PIEZO2) reached genome-wide significance (P < 5 × 10−8). For GOLD status, 131 SNPs involving 15 loci mapped to 8 genes were identified (P < 1 × 10−5). For ICS usage, 215 SNPs involving 16 genomic loci mapped to 18 genes were identified (P < 1 × 10−5), For smoking, 149 genes involving 16 genomic loci mapped to 14 genes were identified (P < 1 × 10−5). None of the SNPs associated with GOLD, ICS and smoking reached genome-wide significance (P < 5 × 10−8).
Host genetic variations are associated with the airway microbial diversityPermutational multivariate analysis of variance (PERMANOVA) revealed a significant association between host genetic variations and the microbiome composition among all individuals (P = 0.001), and within COPD or healthy individuals, respectively (Fig. 2b). When plotting the top 20 principal components (PCs) for host genetics, we found that the slope of the curve generally leveled off after the first PC, suggest the first PC (PC1) could be relatively informative (Fig. S3). Among all individuals, host genetic PC1 was ranked the third among all demographic and clinical features tested in association with the microbiome following site and disease status (COPD or health), accounting for 3.46% of the microbiome variation. When considering the top 5 host genetic PCs, 12.11% of the microbiome variance can be explained. Within COPD or healthy individuals, host genetic PC1 exhibited the greatest association with the microbiome among all features except for clinical site (Fig. 2b, P = 0.005 and P = 0.008). A significant correlation was observed between host genetic PC1 and both microbial alpha (richness, R = 0.18, P = 0.040, Fig. 2c) and beta diversity (the first principal coordinate using Bray-Curtis dissimilarity matrix, R = 0.25, P = 0.0035, Fig. 2d), indicating a close association between host genetics and the airway microbial diversity.
We also performed an additional PERMANOVA by using microbiome PC1 (explaining 23.3% of the taxonomic diversity) and demographic and clinical variables to associate with the host genetic profiles. Among all 133 individuals, clinical site was most significantly associated with host genetic profile (R2 = 0.009, P = 0.001), followed by microbiome PC1 (R2 = 0.008, P = 0.001) and disease status (COPD/healthy, R2 = 0.008, P = 0.032, Fig. S4). Microbiome PC1 was ranked the third and second in association with host genetics in COPD and healthy individuals, respectively. These results further support the close association between the airway microbiome and host genetic variation.
Host genetic variations are associated with COPD microbiome featuresGiven the association between host genetic backgrounds and the airway microbial diversity, we sought to assess the relationships between individual SNPs and the airway microbiome features in COPD individuals. We chose to focus on a subset of microbiome species and functional modules that were reasonably abundant and important in the disease context (significantly different between COPD and controls). For each of the selected microbiome species and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional modules (a total of 158 microbiome features, including 52 species and 106 KEGG modules, see Methods), we performed an association analysis with the concurrent host genetic variants using a general linear mixed model. A total of 12,198 candidate SNPs were identified with associations with all 158 microbiome features (P < 1.0 × 10−5, as suggestive significance), involving 3188 loci mapped to 2131 genes (Table S3). Of them, 276 SNPs involving 45 loci mapped to 30 genes reached genome-wide statistical significance (P < 5.0 × 10−8, Table S3). Of the 2131 genes, 113 genes were known candidate genes reported in previous GWAS studies in association with lung function or COPD (P < 5.0 × 10−8, Table S4), and were selected for further investigation to facilitate interpretation. The SNPs from these genes collectively exhibited 171 associations with 29 microbiome species and 48 KEGG modules (Table S4). Of them, the most significant association was found between Streptococcus salivarius and rs6917641 most proximal to TBC1D32 (P = 9.54 × 10−8). This was followed by associations between Xanthomonas euvesicatoria and two SNPs, rs563696052 and rs368423146 in the intronic region of ERC2 (P = 1.02 × 10−7) and SLC1A2 (P = 1.60 × 10−7), respectively, and the association between Moraxella catarrhalis and rs74066259 in the intronic region of SMIM2 (P = 2.28 × 10−7).
For KEGG modules, the strongest association was found between the assimilatory nitrate reduction (M00537) and rs7166844 in the intronic region of SLC27A2 (P = 1.20 × 10−7), followed by associations involving ribose transport system (M00212, associated with rs6461680 mapped to CARD11), two modules related to secondary metabolism (M00418 and M00022, associated with rs7166844 mapped to SLC27A2 and rs5803203 to ENOX1, respectively), and two modules related to bacterial two-component system (M00511) and transport system (M00592), respectively. Among host genes, SLC27A2 exhibited significant associations with the greatest number of 6 microbiome features, followed by ENOX1 associated with 4 microbiome features. Genetic association analysis using an expanded set of 517 species (relative abundance> 0.0001) and all 461 functional modules revealed a total of 164,236 SNPs associated with 6126 genes, among which 486 genes were candidate genes previously reported as genetically associated with COPD (P < 5.0 × 10−8, Table S5). Of them, 402 associations between 269 microbiome species, 55 functional modules, and 267 host genes reached genome-wide significance (P < 5.0 × 10−8, Table S6).
We also assessed association between airway microbiome and SNPs previously found to be associated with lung function or COPD with genome-wide significance (P < 5.0 × 10−8). In this regard, we have comprehensively searched public literatures and databases (GWAS catalog, Open Targets Genetics, GWAS Atlas) regarding GWAS studies for COPD and lung function. A total of 1427 SNPs from 19 datasets in GWAS catalog database derived from 10 studies associated with lung function measurements using genome-wide significance P-value 5 × 10−8 as cutoff were incorporated for further analysis [5, 29,30,31,32,33,34,35,36,37] (Table S7). We then associated these SNPs with microbiome taxa and functional modules in our dataset. Among the selected 52 microbiome taxa and 106 functional modules, 4 associations were identified involving 3 SNPs (mapped to FOLH1B, WDR18 and TMEM163) and 4 microbiome features (P < 1.0 × 10−5, Table S7). Among all 517 microbiome taxa (identified using relative abundance cutoff 0.0001) and 461 functional modules, 50 associations were identified involving 21 SNPs and 40 microbiome features (P < 1 × 10−5, Table S7). Four associations involving SNP rs11666499 mapped to LIMASI and Acinetobacter species reached genome-wide significance (P < 5 × 10−8, Table S7).
Lactobacillus salivarius and Lactobacillus oris were identified as potential beneficial microorganisms mechanistically involved in COPD in our previous study based on the same cohort [13]. For Lactobacillus salivarius, a total of 76 SNPs involving 14 loci mapped to 11 genes were identified (Table S8), with rs7913363 (mapped to BBP1, PDCD4, and SHOC2) being the most significant (P = 2.57 × 10−7). For Lactobacillus oris, a total of 168 SNPs involving 17 loci mapped to 9 genes were identified (Table S8), with rs6996846 (mapped to AF131215.5 and AKSMO) being the most significant (P = 2.24 × 10−6). None of these genes, however, were previously reported to be genetically associated with COPD.
Among SNPs associated with lung function, none were found to be significantly associated with microbiome features in our dataset (Tables S2, S4). In addition, none of these SNPs or genes associated with inflammatory endotype or clinical traits overlapped with those associated with the microbiome features (Tables S2, S4). On the other hand, at the transcriptomic level, among all 113 host genes involved in microbiome-host genetic associations, 46 genes were found to be associated with sputum neutrophil, 3 genes associated with sputum eosinophil, 25 genes associated with GOLD status, 8 genes associated with ICS, and 1 gene associated with smoking (FDR < 0.05, Table S4). These results suggest a possibly greater impact of these clinical factors on the microbiome-host gene associations at the transcriptomic than the genetic level. Collectively, these results suggest the genetic associations between COPD-associated human genes and the airway microbiome taxonomic and functional features that could imply possible microbiome-host interactions.
The airway microbiome-host genetic associations were transcriptionally linkedWe next analyzed concurrent airway host transcriptomic data to assess any genetically associated microbiome features and host genes that were further correlated at the transcriptional level. Among the 122 microbiome-host genetic associations, 27 significant correlations were identified between the transcriptional level of 24 host genes and 12 microbiome species and 10 KEGG modules (Spearman correlation, P < 0.05, Table 1). The most significant correlations were found between Veillonella parvula and NUDT1 (rho = 0.48, FDR = 1.26 × 10−4) and MAD1L1 (rho = 0.40, FDR = 2.72 × 10−3), both proximal to rs62442525 that exhibited genetic association with V. parvula (Fig. 3a-c). This was followed by correlations of Stenotrophomonas maltophilia-TTLL9 (Fig. 4a, Fig. S5, rho = 0.40, FDR = 3.08 × 10−3, rs9967912), Treponema denticola-RWDD1 (Fig. 4b, rho = 0.37, FDR = 7.76 × 10−3, rs6568956), Rothia mucilaginosa-MAST2 (Fig. 4c, rho = 0.33, FDR = 8.35 × 10−3, rs79257400), Prevotella intermedia-BACR1 (rho = 0.34, FDR = 5.73 × 10−2, rs7204848), and Haemophilus influenzae-LTA4H (rho = 0.30, FDR = 6.21 × 10−2, rs56396137). Of note, a non-significant positive correlation was also found between S. salivarius and TBC1D32 (rho = 0.23, FDR = 8.60 × 10−1, rs6917641, Table 1, Table S4) that exhibited the strongest genetic association. Among microbial functional modules, the most significant correlation was between ERRFI1 and EHEC/EPEC pathogenicity signature (Fig. 4d, Fig. S5, M00542, rho = − 0.29, FDR = 2.03 × 10−2), followed by KLHL42 and microbial tyrosine degradation (Fig. 4e, M00044, rho = 0.29, FDR = 2.65 × 10−2), and PARK7 and EHEC/EPEC pathogenicity signature (Fig. 4f, M00542, rho = − 0.28, FDR = 1.95 × 10−2). These results suggest that the airway microbiome-host genetic associations can be further correlated at the transcriptional level, providing a possible explanation for host genetic variations influencing the microbiome.
Table 1 Correlations between the host gene expression and the genetically associated airway microbiome taxa and modules in COPD patientsFig. 3The correlation between the airway microbiome taxa and its genetically associated host genes at the transcriptomic level in COPD patients. a Regional manhattan plot showing the associations between Veillonella parvula and the host genetic variants in MAD1L1 and NUDT1. The mapped genes are marked in red and the top lead SNP is colored in purple. The red line represents the P-value of 1.0 × 10−5. b, c Scatterplots for the significant correlation between the normalized abundance of Veillonella parvula (x-axis) and the expression of NUDT1 and MAD1L1 (y-axis) in COPD patients. The green and red bars are histograms showing the distribution of the X and Y axis
Fig. 4Additional correlations between the airway microbiome features and their genetically associated host genes at the transcriptomic level in COPD patients. a-c Scatterplots for the significant correlations of three microbiome species-level taxa Stenotrophomonas maltophilia, Treponema denticola, Rothia mucilaginosa with the expression of TTLL9, RWDD1, and MAST2, respectively, in COPD patients. d-f Significant correlations of two microbiome functional modules tyrosine degradation (M00044), and EHEC/EPEC pathogenicity signature (M00542), with the expression of KLHL42, ERRFI1 and PARK7. The green and red bars are histograms showing the distribution of the X and Y axis
To further explore any potential causal associations between the host gene expression and microbiome features, we integrated the lung gene expression quantitative trait loci (eQTL) and gene splicing quantitative trait loci (sQTL) data from Genotype-Tissue Expression (GTEx) database, and performed a summary-data-based Mendelian randomization (SMR) analysis using the summary data sets of the above significant microbiome-host gene pairs, with host genes as exposure and microbiome features as outcome. The eQTL analysis revealed significant associations between microbial type III secretion system (M00332), enteropathogenic and enterohemorrhagic Escherichia coli (EHEC/EPEC) pathogenicity signature (M00542) and PARK7 (Fig. 5a-b, Fig. S6, PSMR = 1.72 × 10−3 and PSMR = 1.63 × 10−3, Table S9), and between hydroxypropionate-hydroxybutylate cycle (M00375) and NARS2 (Fig. 5c, Fig. S6, PSMR = 2.90 × 10−3, Table S9). The sQTL analysis identified significant associations between PARK7 and M00332 (Fig. S7, PSMR = 3.71 × 10−3 and PSMR = 3.69 × 10−3) and M00542 (PSMR = 3.01 × 10−3 and PSMR = 2.99 × 10−3, Table S9). These results support the possibility that host genetic variation could influence the airway microbial functions through transcription or splicing activities.
Fig. 5The correlation between the airway microbiome functional modules and its genetically associated host genes at the transcriptomic level in COPD patients. Regional manhattan plots on the left show the associations between three KEGG modules (M00332, M00542, M00375, a-c with the host genetic variants in PARK7 and NARS2. The mapped genes are marked in red and the top lead SNP is colored in purple. The red line represents the P-value of 1.0 × 10−5. Scatterplots in the middle show the significant correlation of the three KEGG modules (x-axis) and the expression of host genes PARK7 and NARS2 (y-axis). The plots on the right show the corresponding genetic loci from SMR analysis. The gray dots in the top mannattan plots show P-values for SNPs associated with KEGG modules. The bottom plots represent the eQTL P-values of SNPs from the GTEx study for probe ENSG00000116288 tagging PARK7 and ENSG00000137513 tagging NARS2. The genes (PARK7 and NARS2) that passed SMR and HEIDI tests are highlighted in red
Mendelian randomization revealed potential microbiome-COPD associationsTo further explore any genetically mediated causal relationships between COPD and the airway microbiome, we performed a two-sample bidirectional Mendelian randomization (MR) analysis. We selected 13 qualified SNPs as instrumental variables (IVs) (P < 5.0 × 10−8, r2 < 0.02, clumping window = 5000 kb) from the study of Ishigaki et al. that includes an eastern Asian population of 3315 cases and 201,592 controls [38]. Eleven of these 13 qualified SNPs were implemented in our MR analysis after extracting the IVs from outcome GWAS summary data and removing the palindromic SNPs. The F-statistics of IVs are all greater than 10 (range: 31.3–71.2), indicating no evidence of instrument bias (Table S10). MR analyses were performed using COPD as exposure and the 158 microbiome features (52 species and 106 modules) as outcomes.
A causal association was found between COPD and increased relative abundance of Streptococcus intermedius (Fig. 6a, Table 2, OR (95 CI): 2.53 (1.38, 4.70); P = 0.0028) using the inverse-variance weighted (IVW) method. This finding was also supported by the weighted median and weighted mode methods (Table 2, OR (95 CI): 3.42 (1.54, 7.64); P = 0.0029; OR (95% CI): 4.31 (1.17, 15.83); P = 0.048). Forest plots of causal effects using a single SNP showed that none of them was extremely significant for association between exposure and outcome (Fig. 6b). The leave-one-out sensitivity analysis demonstrated that the associations were not driven by any specific SNPs (Fig. S8). No horizontal pleiotropy and no heterogeneity was found between the individual SNPs (Table S11). When analyzing in the opposite direction, no association was found between COPD and airway microbiome features using either of the MR methods (Table 2). These findings suggest a potential causal association between COPD and increased airway S. intermedius mediated through host genetics.
Fig. 6Mendelian randomization for a potential causal relationship between COPD and Streptococcus intermedius. a The scatterplot showing the SNP effects on COPD versus the relative abundance of S. intermedius, with the slope of each line corresponding to the estimated MR effect using each method. b Forest plot showing the MR-estimated effect sizes for COPD on S. intermedius for individual SNPs and their combinations
Table 2 Bidirectional MR results for the relationship between COPD and the relative abundance of Streptococcus intermedius
留言 (0)