Smoking-associated gene expression alterations in nasal epithelium reveal immune impairment linked to lung cancer risk

Study subjects

We recruited 487 subjects among which were 114 healthy volunteers from the Cambridge Bioresource (https://www.cambridgebioresource.group.cam.ac.uk/) and 373 patients referred to the out-patient clinic at Royal Papworth Hospital (Cambridge, UK) or Peterborough City Hospital (Peterborough, UK) with symptoms or imaging suspicious for lung cancer (clinic group). Healthy volunteers are defined as individuals without any prior history or current suspicion of lung cancer who had not undergone any imaging investigations. Within the clinic group, 301 patients were diagnosed with cancer and 72 patients, although initially presenting with symptoms and/or imaging suspicious for lung cancer had a final diagnosis of a benign condition, the majority of which were due to infection or inflammation (Fig. 1, Additional file 2: Table S1). From these donors, we collected a total of 649 samples: 413 nasal epithelial samples by mini-curette from 114 healthy donors and 299 clinic patients, and 236 bronchial brushings from clinic patients (Fig. 1; see the ‘Methods’ section). For 162 clinic patients, both nasal and bronchial samples were collected (Additional file 2: Table S2). Samples from healthy volunteers and clinic patients were collected and processed by the same staff using identical experimental protocols.

Smoking history was obtained for all subjects, confirmed by cotinine test, and recorded as never smokers (NV, n = 45), current smokers (CS, n = 153) and former smokers (FS, n = 289). Former smokers were stratified into 3 categories based on their time from smoking cessation: former smokers who had quit less than 1 month (n = 10), 1 to 12 months (n = 45), or more than 1 year (n = 234, median = 168 months) prior to sample collection (Fig. 1; see the ‘Methods’ section). Cumulative smoke exposure was measured in pack-years and stratified into 4 categories: none, 0–10, 11–30, > 31 pack-years. In addition to smoking status, sex, age, lung cancer subtype and stage and presence of chronic obstructive pulmonary disease (COPD) were recorded according to the GOLD criteria [37] (Additional file 2: Table S2). While most clinic patients with cancer were diagnosed with non-small cell lung cancer (NSCLC; n = 245), 56 subjects presented with metastatic disease from an extra-thoracic primary (n = 8), small-cell lung cancer (SCLC, n = 31) or rare pulmonary cancer, e.g. carcinoid (n = 17). Given the different underlying biology between NSCLC and other types of tumours, these subjects (with cancer status marked as Ineligible in Additional file 2: Table S2) were included in all analyses investigating smoke injury response, but were excluded for lung cancer risk prediction. Clinic patients with a final diagnosis of a benign condition were followed up for a minimum of 1 year to confirm the absence of cancer.

Airway samples underwent RNA sequencing using standard protocols [38]. Blood samples were taken from 467 subjects for germline genotyping with Illumina Infinium Oncoarray platform at 450K tagging germline variants [38]. Total gene expression was quantified as variance-stabilized counts and corrected for batch effects in all downstream analyses (see the ‘Methods’ section).

Healthy volunteers and clinic patients show widespread differences in gene expression

To investigate overall gene expression patterns, we first tested for gene expression differences between all clinic patients (benign and cancer diagnoses) and healthy volunteers using nasal epithelium samples from both current and former smokers correcting for smoking status, pack-years, sex and age. We found extensive differences in gene expression between the healthy volunteer and clinic groups, with 5359 genes differentially expressed (FDR < .05; see the ‘Methods’ section). Genes showing increased expression in clinic patients were enriched for cilium assembly and organization, while genes showing reduced expression were enriched for oxidative phosphorylation and several immune-related pathways, such as neutrophil activation, antigen processing and presentation and response to interferon-gamma (Additional file 2: Table S3). When performing the same comparison in current smokers only, similar enrichment was found in the genes with increased and reduced expression. In former smokers who had quit for more than 1 year, there was no increased expression compared to healthy volunteers for genes related to ciliary function, but there was reduced expression of genes related to immune pathways such as inflammatory response, neutrophil activation and response to interferon-gamma. These analyses demonstrate widespread expression differences between healthy volunteers and clinic patients not solely attributable to differences in smoke exposure and suggest that an immunosuppressed state can be detected in the nasal epithelium of subjects from the clinic group during active smoking and for years after smoking cessation.

In contrast, comparing gene expression between patients with and without cancer in the clinic group and accounting for the same confounding (analysing current and former smokers together) yielded only 28 significantly altered genes (Padj < .05; see the ‘Methods’ section) in the bronchus, and no significantly differentially expressed genes in the nose. Among the 28 differentially expressed genes in the bronchus, 3 were up-regulated in patients with cancer: MMP13, a metalloproteinase known to increase lung cancer invasion and metastasis [39]; EDA2R, a member of the tumour necrosis factor (TNF) receptor superfamily, members of which modulate immune response in the tumour microenvironment [40]; and CTSL, a lysosomal cysteine protease involved in epithelial-mesenchymal transition [41]. The 25 genes down-regulated in cancer patients were enriched in immune-related GO terms, in particular neutrophil-mediated immunity (Additional file 2: Table S4), consistent with our finding in the comparison between clinic patients and healthy volunteers in nasal tissue.

In summary, we observe major gene expression differences in nasal epithelium between healthy volunteers and clinic patients. However, we find no significant signal when comparing patients with lung cancer with those who had a final benign diagnosis (despite initially being suspicious for lung cancer). This result is in contrast to that obtained in the AEGIS study [16], which reported a notable difference in nasal gene expression between clinic-referred cancer and benign patients. However, we found a significant overlap between the set of differentially expressed genes between cancer and no-cancer in AEGIS and the set of differentially expressed genes between our clinic and healthy groups (P = 1.44 × 10−5). These results may be explained by differences in the nature of the benign (non-cancer) diagnoses between the two studies. In our study, the majority of patients in the clinic group had clinical symptoms/imaging highly suspicious for lung cancer. Patients with a final benign diagnosis were predominantly due to significant typical bacterial infection/inflammation (pneumonia). However in the AEGIS cohorts, many of the benign diagnoses, where known, were due to sarcoidosis, fibrosis, benign tumours or atypical infections (fungal and mycobacterial). Therefore, in our cohort, the pre-test probability for malignancy in the benign group was higher than in the AEGIS benign group.

Gene expression response to smoke injury differs between healthy volunteers and clinic patients

Intrigued by these overall expression differences between healthy volunteers and clinic patients, we investigated the post-cessation dynamics of individual genes using a population-based approach. We first employed a Bayesian linear regression model to predict nasal gene expression in healthy volunteers as a function of smoking status, accounting for sex and age (see the ‘Methods’ section). This model classified genes as either unaffected by smoking (US), rapidly reversible (RR; no difference between former and never smokers), slowly reversible (SR; intermediate expression levels in former smokers compared to never and current) or irreversible (IR; no difference between former and current smokers). Additionally, genes were classified as cessation-associated (CA) if no difference was present between current and never smokers, but elevated or reduced expression was observed in former smokers (see Additional file 1: Fig. S1 for a schematic).

In healthy volunteers, 5755 genes were found to be affected by smoking status, out of which 513 genes show a strong effect (effect size > 0.4 for rapidly reversible, slowly reversible, irreversible genes, > 0.25 for cessation activated genes; see the ‘Methods’ section, Additional file 2: Table S5). Most genes (485/513) were found to be rapidly reversible, in line with previous findings in bronchial tissue [9]. GO pathway analysis of these genes revealed up-regulation of cellular detoxification, response to oxidative stress (e.g. CYP1A1, CYP1B1, AHRR, NQO1, GPX2, ALDH3A1) and keratinization (e.g. KRT6A, KRT13, KRT17, SPRR1A, SPRR1B, CSTA) pathways, and down-regulation of cilium organization (e.g. FOXJ1, DNAH6, IFT81, CEP290, UBXN10), extracellular matrix organization (e.g. FN1, COL3A1, COL5A1, COL9A2) and interferon-signaling (e.g. IFI6, IFIT1, IFI44, RSAD2) in current compared to never smokers. Genes involved in inflammatory response were found both among the up-regulated (IL36A, IL36G, S100A8, S100A9, CLU) and down-regulated (SAA1, SAA2, IL33) genes. Principal components analysis using the rapidly reversible genes showed a clear separation of current smokers from all other subjects. In contrast, slowly reversible and irreversible genes placed patients on a trajectory from never smokers to current smokers, as expected (Additional file 1: Fig. S2a).

We next repeated the above analysis on the clinic subgroup. In the absence of clinic never smokers, and since no technical or biological covariates could explain the observed overall expression differences between the groups (see the ‘Methods’ section), we considered the healthy volunteer never smokers as a bona fide reference group for this analysis. We found 4112 genes with smoking-dependent expression changes, 584 of which showed a strong effect (same effect size thresholds as above; see the ‘Methods’ section and Additional file 2: Table S5). We evaluated this classification with a principal components analysis on the clinic subjects, similar to what was done for healthy volunteers, and found that patients clustered according to their smoking status, as expected (Additional file 1: Fig. S2b). Of the 584 genes identified as dysregulated by smoke in the clinic patients, 233 were also found in the healthy volunteer analysis (P < .001, chi-squared test, Additional file 1: Fig. S3). However, while most of these genes (227/233) were rapidly reversible in the healthy volunteers, only 113 were also classified as rapidly reversible in the clinic group (Fig. 2a). Of the remaining 120 genes, 2 genes (BPIFA2 and CLU) were classified as irreversible and 24 genes as slowly reversible, including CYP1B1, a well-known detoxification gene, and BMP7, a gene previously shown to have a role in immunoregulation [42] (Fig. 2b). The remaining 94 genes were classified as rapidly reversible in the healthy volunteer group and as cessation-associated in the clinic group (e.g. UBXN10, Fig. 2b) and showed a strong enrichment for cilia structure and function (Additional file 2: Table S6). While cilia-associated genes were down-regulated in current smokers in both groups (consistent with cigarette smoke damaging airway cilia), the same genes showed increased expression in current and former smokers in the clinic group compared to the healthy volunteers. This observation in the clinic group might be linked to the decreased expression of interferon-gamma-related genes in the clinic group, as it has been shown that interferon-gamma suppresses ciliogenesis and ciliary movement [43].

Fig. 2figure 2

Smoke injury dynamics. a Plot showing the change of reversibility dynamics for the 749 response genes in the healthy volunteer (left) and clinic (right) donor groups (genes classified as unaffected by smoking in both donor groups were removed). Color bars represent the number of genes in each reversibility class (blue = rapidly reversible, yellow = slowly reversible, red = irreversible, green = cessation associated, grey = unaffected by smoking). b Normalized gene expression over smoking status for 4 exemplar response genes with different post-cessation dynamics in the clinic and healthy groups, with linetype and shape representing donor status (plain line = clinic group, dashed line = healthy volunteer) and colors representing the genes’ assigned reversibility classes (same color code as panel a). See also Fig. S1 for schematic examples

Lastly, the 351 genes that showed smoking-dependent expression changes in the clinic group but not in the healthy volunteers (Fig. 2a) were strongly enriched in extracellular matrix organization and immune-related genes (including response to interferon-gamma, neutrophil activation, chemotaxis and inflammation). For example, GBP6 showed down-regulation and slow reversibility in the clinic group (Fig. 2b) and is known to be associated with reduced overall survival in squamous cell carcinoma of the head and neck [44].

Overall, we observe striking differences in smoke-dependent gene expression in the clinic patients compared to volunteers that could not be explained by comorbidities or other covariates, with generally slower reversibility post-cessation in the clinic group. We hypothesize that some of the 749 genes with differences in smoke-dependent expression might reflect individual responses to the smoke injury and thus refer to them as response genes.

Response gene expression levels predict disease status and may improve risk stratification for population screening

We postulated that the smoke-injury response genes we identified might provide evidence for a personalized smoke injury response and be candidate genes for a molecular biomarker of lung cancer risk. In the clinic group, where patients already show evidence of lung disease, such a biomarker would help identify patients with the highest need for further investigation. In the general smoker and former smoker population, it could be added to existing methods of risk stratification to improve the identification of individuals who would most benefit from lung cancer screening thereby sparing those at the lowest risk who would have least to benefit from screening.

Therefore, we trained two independent classifiers: a ‘clinic classifier’ that predicts the cancer status of each sample (cancer vs clinic benign and healthy volunteers: potentially of use in the clinic) and a ‘population classifier’ that predicts the donor group that the samples were taken from (clinic benign or clinic cancer vs healthy volunteers: potentially of use in risk stratification for population screening). For both classifiers, we used gene expression data from the 749 response genes together with clinical information (sex, age, smoking status and pack-years; see the ‘Methods’ section) in a lasso-penalized multivariate logistic regression and derived a log-odds score from each classifier. In line with the observed strong expression, differences between healthy volunteers and clinic patients, the ‘population’ score clearly separates healthy volunteers from clinic subjects (Fig. 3a). Interestingly, the ‘clinic’ score (Fig. 3b) additionally distinguishes the benign and cancer patients within the clinic group, placing benign subjects between healthy volunteers and cancer subjects. As expected, the two scores are highly correlated (Pearson correlation = 0.8, P < .001, Additional file 1: Fig. S4a). Both scores yielded high area under the curve (AUC) values for both precision-recall (clinic score: mean AUC-PR = 0.83 after 10-fold cross-validation; population score: mean AUC-PR = 0.85, 10–fold cross-validation, Fig. 3c, d) and receiver-operator characteristics (clinic score: mean AUC-ROC = 0.84, 10-fold CV; population score: mean AUC-ROC = 0.92, 10 fold CV; see also the ‘Methods’ section) and performed significantly better than a model using the same number of randomly selected genes (Additional file 1: Fig. S5). In practice, to reach a sensitivity of 95% for the population score, one would use a score threshold of 2.69, which would result in an average false-positive rate of 42.8%, while to reach a similar sensitivity using clinical data alone would result in a false-positive rate of 74.5%. For the clinic score, a score threshold of −1.46 gives a 95% sensitivity and false positive rate of 62.1%, while similar sensitivity with clinical data alone would result in a false-positive rate of 67.8% (Fig. 3c, d). These results indicate that models incorporating gene expression data of the response genes defined above performed significantly better than models built on clinical covariates alone (see also inset of Fig. 3c, d for a comparison of the performance of models based on gene expression data alone, clinical covariates alone or a combination of gene expression data and clinical covariates). In addition, both scores retained their ability to separate the patient groups after regressing out all potential confounders, confirming that gene expression data improves classification compared to using clinical covariates alone (Sup. Fig. 4b, c).

Fig. 3figure 3

Disease status prediction based on response genes. a, b Risk score distribution for the population test (a) and the clinic test (b) predicted from the clinical variables and the expression of the response genes using a penalized regression (see the ‘Methods’ section). The risk distributions are presented separately for healthy volunteers (green), clinic patients without cancer (orange) and clinic patients with cancer (purple). c, d ROC curves for the population (c) and clinic (d) scores. For each case, we present the ROC curve for the model trained on clinical data (triangles) or on gene expression and clinical data (squares). Each curve is an average obtained across 100 cross-validation (CV) experiments and the grey area surrounding the curve gives the standard error. The color of the curve represents the test threshold corresponding to the represented sensitivity/false-positive rate compromise. (Inset) Area under the ROC curve, in 100 CV rounds, for a clinical-only model (red) the model constructed on the response genes (blue) and a model constructed on a combination of clinical information and response genes (green) for the population (c) and clinic (d) classifiers. P values given above each box are computed using a 2-sample t-test. e The population and clinic classifiers applied to nasal samples from the AEGIS cohort

We also assessed the performance of the trained population and clinic risk score models separately on current and former smokers. We found that the population risk score is equally applicable to current and former smokers: a significant difference in the risk score of the healthy volunteers and clinic subjects can be observed, even after regressing out clinical covariates and confounding (Additional file 1: Fig. S6). While the clinic risk score performs well on both groups, the added value from gene expression data appears less important in the clinic score, in particular in former smokers (Additional file 1: Fig. S6). We have shown that our classifiers are efficient at separating subjects regardless of their cancer stage, cancer type (squamous carcinoma or adenocarcinoma) and COPD status (Additional file 1: Fig. S7) and that our classifiers capture differences in risk that persist for more than 10 years after smoking cessation (Additional file 1: Fig. S8). Because COPD is a known risk factor for lung cancer, we also compared the potential additional contribution of COPD data and of gene expression data, singly and in combination, to the risk classifiers based solely on clinical data (Additional file 1: Fig. S9). We found that COPD data add little to the performance of either the clinic or population classifier.

Finally, we validated our classifiers by applying them to an independent cohort. No publicly available cohort matches the composition of our cohort, in particular because of the absence of a healthy group of current and former smokers distinct from the clinic-referred patient group. However, the AEGIS cohort [45] includes nasal samples from clinic-referred patients with pulmonary nodules and a diagnosis of lung cancer or benign disease. We applied our two classifiers to this cohort and found a good separation between subjects with and without cancer, despite the different gene expression quantification technologies and populations of origin of the patients (Fig. 3e, Additional file 1: Fig. S10). We found a stronger separation between patients with and without cancer using the AEGIS nasal classifier from Perez-Rogers et al. (2017) [16] on the AEGIS data (Additional file 1: Fig. S10a). However, we note that the AEGIS classifier [16], when applied to our data, mostly differentiates healthy volunteers and clinic patients while the difference between the scores of cancer and no-cancer patients is only modest (Additional file 1: Fig. S10b). These results confirm the ability of our classifier to stratify patients, even when applied to patients from different clinical contexts.

Overall, our results demonstrate that classifiers based on nasal gene expression have the potential to improve risk stratification of current and ex-smokers in both a population screening context and a clinic context.

Alterations in immune pathways underlie the lung cancer risk classification

To gain insights into the mechanisms of risk, we asked which genes robustly contributed most to the classifiers by identifying genes selected in more than 80% of the cross-validation (CV) rounds (Additional file 1: Fig. S11). Among the 46 genes selected most often in either of the risk prediction models, we found genes that were previously identified as important players in lung cancer development, e.g. SAA2 [18], HAS2 [46,47,48] or TGM3 [49,50,51,52], in line with the current literature.

However, the genes used as predictors of risk in our model reflect a wide variety of smoking-associated alterations. In order to gain some mechanistic insight, we investigated risk contribution at the pathway level. First, we performed GO enrichment analysis on the list of smoke injury genes (both the ones identified in the healthy volunteers and in the clinic group) to identify the main pathways affected by smoke. We found that the smoke injury genes are mainly involved in xenobiotic metabolism and response to oxidative stress, extracellular matrix organization, keratinization, ciliary structure and mobility and immune response (Additional file 2: Table S7). We then chose 8 GO terms as representatives of these alterations: Keratinization, Extracellular matrix organization, Xenobiotic metabolism, Cilium organization, Inflammatory response, Neutrophil mediated immunity, Response to interferon gamma and Antigen processing and presentation. We calculated geneset metascores for each of these GO terms (Fig. 4a and Additional file 1: Fig. S12). For some of these pathways, such as Keratinization, we observed a similar, rapidly reversible dynamic in healthy volunteers and clinic patients (Additional file 1: Fig. S12a). For most pathways, however, the dynamics were different in the two donor groups. Cilium organization appeared to be rapidly reversible in healthy volunteers, while in clinic patients it showed increased expression in former smokers, with no difference between current and never smokers. Xenobiotic metabolism showed a slower reversibility in clinic patients than in healthy volunteers (Additional file 1: Fig. S12a). For all immune-related pathways, we observed reduced expression in current smokers, and a slow reversibility dynamic, uniquely in clinic patients (Fig. 4a); we also observed that their activity does not revert to healthy never-smoker level even long after smoking cessation (Additional file 1: Fig. S12b).

Fig. 4figure 4

Pathway analysis and contribution to risk. a Comparison of geneset metascore (average vst-normalized gene expression; see the ‘Methods’ section) over smoking status for 4 immune-related GO terms in healthy (dashed line, triangles dot) and clinic subjects (plain line, round dot). b Correlation between the population or clinic risk score and geneset metascore for the 8 gene sets representing biological functions altered by smoking; spearman correlation is shown separately for current and former smokers (> 12 months); Spearman correlation values are reported (blue = positive correlation, red = negative correlation), as well as the associated p-values (*P ≤ 0.05, **P ≤ 0.01, ***P ≤ 0.001)

To identify which of these pathways contributed most to increased risk, we then calculated the correlation between geneset metascore in each subject and subjects’ risk scores from the population and clinic classifiers. We calculated these correlations for current and former smokers (> 12 months) separately, to be able to identify differences in geneset contribution to risk in the two groups that might reflect differences between acute smoke injury response and the long-term consequences of past smoke exposure (Fig. 4b). In current smokers, while Keratinization and Extracellular matrix organization did not significantly correlate with either risk score, the remaining four genesets tested showed moderate but significant correlation with both risk scores, pointing to alterations of the xenobiotic detoxification pathways, ciliary function and immune response as major contributors to patient-specific differences in risk. In former smokers, the population risk score correlated with the same 4 GO terms indicating that detoxification pathways, ciliary function and immune response are the main contributors to the overall risk of lung disease. In contrast, only pathways related to immune alterations (Response to interferon gamma, Neutrophil-mediated immunity, Antigen processing and presentation) correlated with the clinic risk score in former smokers, while no correlation was observed with Xenobiotic metabolism, and only a very weak correlation with Cilium organization (Fig. 4b). These results indicate that immune alterations are significant contributors to the risk of cancer in both current and former smokers in the clinic group.

Patient-specific genetic background modulates the smoke injury response

Germline genetic variation may influence individual differences in response to airway smoke injury, and hence, risk of smoking-related lung cancer. To investigate this, we first conducted an eQTL analysis on nasal and bronchial epithelium separately and jointly to identify variants that affect the expression of neighbouring genes (see the ‘Methods’ section). We obtained 990 (bronchial), 1316 (nasal) and 1695 (combined) eQTL effect genes (e-genes) at 1% FDR. We found a significant overlap between the nasal and bronchial e-genes (Additional file 1: Fig. S13a), with 574 genes in common (corresponding to 58% and 44% of the bronchial and nasal eQTL respectively, Fisher’s exact test P < .001). Similarly, we found a correlation of 0.56 between the adjusted p-values of the lead variants between both sets (Additional file 1: Fig. S13b), confirming shared cis-regulation between the nasal and bronchial epithelium.

To further study the interaction between subject-specific genetic background and environmental factors, we next leveraged this eQTL catalogue to search for genetic variants within the 749 response genes that might modulate gene expression differently depending on subjects’ smoke exposure. We identified 78/749 genes with at least one lead eQTL variant with genome-wide significance at 10% FDR (Additional file 2: Table S8). We then tested for an interaction effect between smoking status and genotype for all 78 lead eQTL variants on gene expression. We identified 11 genes (CH25H, LHX6, WNT5A, DRAM1, SULF1, LGALS7B, HAPLN4, FXYD5, EFCAB2, TOX and SPRR1A; see Additional file 1: Fig. S14) whose expression changes in response to smoke are modulated by the presence of genetic variants (nominal P < .1, Additional file 2: Table S8), suggesting that those genetic variants might modulate the response to smoke injury and to lung cancer risk. For example, up-regulation of FXYD5 has been shown to correlate with tumor size [53] and poor survival [54] in NSCLC and to be implicated in many cancer types as FXYD5 enhances NFκ-B transcriptional activity, promotes angiogenesis and increases tumor cell’s migration and invasion abilities [55]. Finally, this protein also promotes inflammation in epithelial cells, notably in lung tissues [56]. Analysing the expression of this gene in our cohort, we find that subjects with a homozygous reference genotype at the 19:35660670:G:A locus have similar levels of expression both in never, ex and current smokers (Fig. 5a). On the contrary, subjects that have a heterozygous or homozygous alternative genotype present higher levels of expression of this gene in response to smoke (Fig. 5a), which might increase their lung cancer risk. We observe similar trends for the 10 other response genes stated above (Additional file 1: Fig. S14, Additional file 2: Table S8). This finding demonstrates how subjects’ specific genetic background can influence their reaction to cigarette smoke and in turn might affect their risk of developing lung cancer.

Fig. 5figure 5

Genotype background influences lung cancer risk. a Combined environmental and genetic effect on the expression of the FXYD5 gene in nasal tissues. For each nasal sample, we present the expression level of the gene FXYD5 separately for never (pink), former (green) and current (blue) smokers. Samples are further stratified depending on the genotype of the subject at the 19:35660670:G:A locus (Ref/Ref: homozygous reference; Ref/Alt: heterozygous; Alt/Alt homozygous Alternative). The p-value gives the significance level of an interaction effect of the smoking status and the genotype at 19:35660670:G:A on the expression of the FXYD5 gene (see the ‘Methods’ section). GWAS enrichment analysis: (b) Network representation of the 4 bronchial regulons enriched in GWAS genes. The 4 TFs are shown as squares and their target genes in the bronchial network as circles. The colour of the nodes indicates whether the gene/TF is a smoke injury risk gene (blue), a gene that co-localizes with a GWAS hit (i.e. no threshold on eQTL significance) (red) or both (green). The level of overrepresentation for genes in the network of those TFs can be found in Table 1. c Activity level of each of the 4 TFs in nasal tissue, depending on the disease status of the patient (green, healthy volunteer; orange, clinic patient without cancer; purple, clinic patient with cancer). Stars represent the significance of a two-sample t-test (ns, p > 0.05; *p ≤ 0.05; **p ≤ 0.01; ***p ≤ 0.001; ****p ≤ 0.0001)

Common germline variants regulate interferon-gamma genes and are linked to known lung cancer risk loci

We next identified GWAS hits that were in strong linkage disequilibrium in the UK population to SNPs that we found to be regulating the expression of nearby genes in our eQTL analyses (see the ‘Methods’ section). Among the 1261 GWAS lung cancer risk loci, our analysis identified 63 GWAS risk loci from 13 different studies with variants that significantly affect the expression of a nearby gene at a 5% FWER threshold (Additional file 2: Table S10). These 63 eQTL/GWAS variants were linked to the expression of 41 genes, notably including 10 genes implicated in the interferon-gamma signalling pathway. Pathway enrichment confirmed a strong enrichment for genes involved in response to interferon-gamma (hypergeometric test, Padj = 7 × 10−13), as well as for other immune-related functions (e.g. innate immune response, antigen processing and presentation of exogenous peptide antigen, regulation of immune response, T cell receptor signalling pathway; see Additional file 2: Table S11 for the full list of enriche

留言 (0)

沒有登入
gif