A prospective study of smoking-related white blood cell DNA methylation markers and risk of bladder cancer

Study population and methylation data

Of the 1638 participants included initially in our study, 40 had more than 30% of the methylation data missing (14 from ATBC, and 26 from PLCO) and were excluded from the analyses, leaving us with a population size of 1598 participants. These included 766 bladder cancer cases (288 from PLCO and 478 from ATBC) and 832 controls (308 and 524 in PLCO and ATBC, respectively). Their characteristics are summarised in Table 1 and show that participants were between 49 and 74 years old at recruitment. All 1002 ATBC participants were males and current smokers at recruitment. In the full population, more than 90% of participants were males, and more than 65% were current smokers. As expected, none of the frequency-matching criteria (age at recruitement, and gender) differed between cases and controls and differences could only be observed for smoking exposure variables. After filtering CpG sites due to missing values, 25,393 were excluded (including one smoking-related CpG site) leaving 2670 smoking-related CpG sites and 460,119 CpG sites for our analyses.

Table 1 Study population description by gender, age, age at recruitment and smoking exposure variablesInvestigating the effect of smoking on bladder cancer

Of the 2670 CpG sites assayed in our data that were previously reported to be smoking-related, 200 CpG sites were found to be differentially methylated at a Bonferroni corrected significance level in our data. Principal component analysis of the methylation M-value at these CpG sites suggested that 71 components were necessary to explain more than 80% of the total variance, and the first component (PC1), alone, explained more than 19.3% of the variance (and the first 10 PC explain jointly 47% of the variance). Hypothesizing that PC1 provided a reasonable summary of the 200 smoking-related CpG, we used it as a proxy for the smoking-related CpG sites in subsequent smoking-adjusted analyses (the score was used as an adjustment variable). All smoking metrics were associated with risk of bladder cancer with OR ranging from 1.8 to 3.75 (Table 2, OR > 1.8 and p-value < \(4.1\times ^\)). After adjusting the model for the methylation level at cg05575921 (AHRR), the CpG site exhibiting the strongest association with smoking status in our data (\(\beta \)=− 1.943, p  < \(^\)), results were attenuated with OR for the questionnaire-based smoking metrics ranging from 1.36 to 2.51, and corresponding p-values from \(6.9\times ^\) to \(1.4\times ^\) (Table 2). After adjusting our model for PC1, results were further attenuated (OR ranging from 1.32 to 2.47 and p-values from \(9.1\times ^\) to \(1.1\times ^)\). Conversely, the risk of bladder cancer by quartiles of cg05575921 (AHRR) showed ORs ranging from 1.41 to 2.58 and p-values from \(1.3\times ^\) to \(5.7\times ^\) and for PC1 ORs ranging from 1.25 to 2.48, and corresponding p-values from \(1.0\times ^\) to \(1.6\times ^\). When adjusting for smoking metrics the ORs attenuated for both AHRR and PC1, which was most pronounced for smoking duration (2.58 to 1.47 and 2.48 to 1.63 when comparing Q4 versus Q1 for AHRR and PC1, respectively) (Table 3).

Table 2 Bladder cancer risk (Odd Ratio [OR] and 95% confidence intervals) by quartiles of smoking metricsTable 3 Bladder cancer risk (Odd Ratio [OR] and 95% confidence intervals) by quartiles of methylation methylation level at cg05575921 (AHHR) and the first principal component (PC1) summarising the methylation levels at the 200 smoking-related CpG siteInvestigating 2670 previously identified smoking-related CpGs and bladder cancer

Linear mixed models identified 28 differentially methylated smoking-related CpG sites in relation to bladder case–control status at a Bonferroni corrected significance level (\(p=\frac}=1.87 \times 10-05\)) (Fig. 1a). Of these, 27 were hypo-methylated in prospective cases, and only cg08035323 (YWHAQ) was found hyper-methylated (\(\beta \)= 0.261 and p-value = 2.89e − 08). Stratifying the analyses by study (Supplementary Fig. 1a) we found that 8 of these 28 associations were significant in PLCO only, none in ATBC only, 3 in both PLCO and ATBC separately, and 17 were found statistically significant in the pooled analysis. The sign of the effect size estimates was highly consistent between the two studies (Supplementary Fig. 1b). Similarly, analyses restricted to current smokers from both PLCO and ATBC studies (N = 1100 participants) identified 5 differentially methylated sites at p < \(1.87\times ^\), all of which were also identified in the full study population (Supplementary Fig. 2a). The strong consistency in the effect size estimates from the stratified analysis by study (Supplementary Fig. 2b) suggests that the signal attenuation we observe for smoking-associated CpGs between studies, may, at least partially, be attributed to less contrast in tobacco use due to the lack of non-smokers in ATBC.

Fig. 1figure 1

Results from the univariate analysis relating the methylation M-value at each of the 2,670 smoking-related CpG sites and the bladder cancer case/control status. The volcano plot (a) represents, for each of the 2670 CpG site separately, the effect size estimate (β; X-axis) representing the estimated methylation difference (on the logit scale) between cases and controls, and the p-value (Y-axis) for the null hypothesis of no association (\(_: \beta =0\)) on the log10 scale. Horizontal red dashed line represents the Bonferroni-corrected significance level ensuring an FWER < 0.05 (n = 28). CpG sites found differentially methylated at an FDR level of 0.05 (N = 191) are presented in yellow. The associations between the 2670 smoking-related CpG sites and smoking status in our data are summarised in panel B by their p-values and are plotted against the p-value for the association with bladder cancer status. The (n = 200) CpG sites associated to smoking status are above the horizontal dashed line, which represent the Bonferroni-corrected significance level ensuring an FWER < 0.05. The (N = 24) CpG found associated to both smoking and bladder cancer status are presented in dark red, those exclusively associated to bladder cancer (N = 4) and smoking (N = 176) are plotted in light red and orange respectively. The marginal histogram along the axis summarise the number of CpG sites associated to bladder cancer (along the Y-axis) or to smoking (along the X-axis) in a given range of p-values for smoking (Y axis) and bladder cancer (X-axis). Panel C represents the 37 CpG sites with bladder p-values ranging from \(^\) and \(^\), and smoking p-values between \(^\) and \(^\). Among these, 17 are associated with both smoking and bladder cancer status, 4 are associated with bladder cancer but not smoking and 9 are associated with smoking but not bladder. Panel D represents the 92 CpG sites with bladder cancer p-values between \(^\) and \(^\) and smoking p-values between \(^\) and \(^\). Among these, 3 are associated with both smoking and bladder cancer and 82 are associated with smoking but not bladder cancer

Figure 1B represents smoking p-values (Y-axis) as a function of bladder cancer p-values (Y-axis) for the 2670 smoking-related CpG sites. The 6 most highly significant smoking CpGs were also the most highly associated with bladder cancer (smoking p-value < \(^\), bladder cancer p-value < \(^\)). As indicated in Fig. 1b and c there were only 4 smoking-related CpG sites that were associated with bladder cancer but not smoking status in our data (p-ranging from \(2.19\times ^\) to \(1.47 \times ^\)): cg11314684 (AKT3), cg19583819 (NRG2), cg13038618 (IRF2BPL), and cg14074174 (SNAPC2). We observed an additional 9 CpG sites with smoking p-values ranging from \(1.7\times ^\) to \(1.74\times ^\) that were borderline significantly associated with bladder case–control status (Fig. 1c p-value ranging from \(1.97\times ^\) to \(8.17\times ^\)): cg18146737 (GFI1), cg10255761 (KLHDC8B), cg19859270 (GPR15), cg03991871 (AHRR), cg11902777 (AHRR), cg01901332 (ARRB1), cg01513913 (MIR4539), cg00310412 (SEMA7A), cg01127300 (TMEM184B). This represents a very small proportion (< 0.2%) of the CpG sites with smoking p-values > \(^\). That proportion dramatically increases for CpG sites with stronger associations with smoking. In particular, while 7/136 (5.15%) of the CpG sites with smoking p-values < 10–10 were associated with bladder cancer, 4/12 (33%) of the CpG sites in the \([^, ^]\) smoking p-value bracket were associated with bladder cancer status, and all CpG sites with smoking p-values below \(^\) (N = 7) were associated with bladder cancer (Fig. 1b). Similarly, all CpG sites with bladder p-values below \(^\) (N = 11) were also associated with smoking status. Among the 36 CpG sites with bladder cancer p-values ranging from \(3.16\times ^\) and \(^\), 22 were associated with smoking, and 4 were borderline significantly associated with smoking: cg04517079 (FOXP4), cg04263702 (FBXL18), cg15187398 (MOB3A), cg11436113 (SLC24A3) (Fig. 1d).

Methylation M-values of the 28 bladder-related CpG sites were recoded into quartiles, from which odds ratios were calculated (Supplementary Table 1). For each CpG site a clear risk gradient (p-trend < 0.001) across methylation quartiles was observed. ORs for the highest methylation quartile range from 1.58 to 2.63 for the 27 CpG sites found hypomethylated in cases, and OR = 2.17 for cg08035323 (YWHAQ) (Fig. 2a).

Fig. 2figure 2

Odds ratios (ORs) calculated from the methylation M value at the 28 bladder-related CpG sites, which was recoded into quartiles. a The loadings coefficients of the first component of the Principal Component Analysis of the 28 methylation levels are presented in panel b. Using the same quartile discretisation for the scores of the 13 first components (jointly explaining 80.69% of the total variance), we calculated the OR for each component (panel c). The OR derived from the score of the first component were further adjusted for smoking duration, cumulative smoking exposure (in packyears), and smoking intensity (panel d). For all calculated OR, a linear model was used to test for a trend in the OR across methylation quartiles. For readability, corresponding p-values were coded as * for p-values in [0.05, 0.01], ** for p-values in [0.01, 0.001], and *** for p-values < 0.001. To ensure comparability across OR estimates, these were calculated setting the lowest quartile as reference, and derived the OR from the absolute value of the effect size estimate. As such, for CpG sites (or PC scores) found inversely associated to bladder cancer risk (marked in blue), the reported OR represents the risk change per-unit loss in methylation (or score), and for CpG sites found directly associated to disease risk (marked with a red), the OR represents the risk change per unit increase in methylation level (or score)

We conducted a principal component analysis (PCA) on the methylation M values at the 28 bladder-related CpG sites. Loading coefficients of the first component (explaining more than 37% of the original variance) were positive for all CpG sites except cg08035323 (YWHAQ) (Fig. 2b). We observed the same trend in ORs across the quartile of the scores of the first PC as with individual CpGs (Q4-Q1 OR = 3.01), and ORs by quartiles for the other PCs were weaker and did not exhibit any significant trend (except for PC5 and PC7) (Fig. 2c). ORs from the scores of the first component were further adjusted for smoking duration, cumulative smoking exposure (in packyears) and smoking intensity (Fig. 2d). All showed a similar pattern across quartiles and were slightly attenuated, in particular after adjusting for smoking duration. Analyses restricted to current smokers (N = 1100) showed similar results (Supplementary Fig. 3), but the attenuation upon adjustment for smoking duration was even less (Supplementary Fig. 3d, OR for the last quartile of PC1 scores adjusted on duration is 2.93 while it was 3.01 in the full population).

As a sensitivity analysis, we calculated the OR for each of the 28 bladder-related CpG sites (Supplementary Fig. 4), adjusting for smoking duration, cumulative smoking exposure, and smoking intensity. Results suggest that ORs are attenuated for all 28 CpG sites upon adjustment for smoking exposure, and that the attenuation is stronger while adjusting for smoking duration, irrespective of the CpG site.

Epigenome-wide analyses of bladder cancer

We compared bladder cancer cases with controls using the same univariate linear mixed model on the full set of CpGs and identified 11 differentially methylated CpG sites at a Bonferroni significance-corrected level (\(p=\frac}=1.09 \times ^\)), and 18 differentially methylated CpG sites while controlling the false discovery rate at 0.05 (Fig. 3). Of these 18 CpG sites, 15 were among those identified in our smoking-related analyses, while the remaining 3 cg09317508 (MIR4689), cg18826637 (ZEB2), and cg05845217 (LOC101929153) have not been systematically reported as being smoking-related in the literature. Epigenome-wide analyses restricted to current smokers (Supplementary Fig. 5) did not identify any differentially methylated CpGs (irrespective of the multiple testing correction used). However, the CpG sites identified in the full population were among the strongest associations in current smokers with consistent effect estimates compared to the full population with p-values ranging from \(^\) to \(1.24\times ^\) (Supplementary Table 2). Further adjustement for blood cell composition did not affect our conclusions (result not shown).

Fig. 3figure 3

Manhattan plot summarising the full resolution association study relating the methylation M value at the 460,119 assayed CpG sites and bladder cancer case–control status. CpG sites that were found in the smoking-related analyses are represented by a triangle. Name and corresponding gene are only represented for the 11 differentially methylated CpG sites at a Bonferroni-corrected significance level and for the additional 7 differentially methylated sites with an FDR < 0.05

Investigating differentially methylated regions (DMRs)

Differentially methylated regions analyses performed on the entire 450k CpG dataset identified 19 Differentially Methylated Regions containing 77 CpGs sites at an FDR level below 0.05 (Fig. 4a).

Fig. 4figure 4

Description of the 19 identified Differentially Methylated Regions (DMR) in relation to Bladder cancer case–control status (a). For each of the 77 CpG sites included in the 19 DMRs, we report their p-value in relation to (i) smoking (inner circle), and bladder cancer (outer circle). CpG sites that are among the 2670 smoking-related CpG sites are coloured in dark red, CpG sites that are one order away from smoking are coloured in orange, and those two orders away from smoking, in blue. As depicted in panel b, of the 77 CpG sites included in the 19 identified DMRs, 37 are related to smoking, 36 one order away from smoking (i.e. correlated to at least one smoking-related CpG site but not smoking directly), and 4 correlated to at least one ‘order 1’ CpG site (second order). For clarity we represent all CpG sites that are not within the identified DMR and correlated to any CpG site in the identified DMRs as a single node in B (large nodes)

DMR included between 2 and 9 CpG sites each, with a length ranging from 18 to 2125 base pairs, and were located on chromosomes 1, 2, 5, 6, 7, 11, 14, 15, 16, 19, 21, 22 and X. Of these 19 DMRs, 5 included at least one of the 11 differentially methylated CpG sites identified in our univariate analyses; altogether 9 of the 11 genome-wide differentially-methylated CpG sites were located within the 19 DMRs.

The 19 DMRs included 77 CpG sites, of which 68 were not identified in our univariate analyses. Among these 77 CpG sites, 37 were among the 2670 smoking-related CpG sites, and the p-value for their association with smoking status in our data ranged from \(5.17\times ^\) to \(1.34\times ^\). For example, DMRs 4 and 6 contain cg21566642 near ALPPL2 and cg05575921 in AHRR, the two most highly significant CpGs for smoking. The remaining 40 CpG sites (located in DMRs 3–5, 10, 11,13, 15–19) were not directly related to smoking, and of these 35 ‘first order’ CpG sites were significantly correlated with at least one of the 2670 smoking-related CpG site, and 5 ‘second order’ CpG sites were correlated with at least one ‘first order’ CpG site but not directly with any smoking-related CpG site. Twelve DMRs include at least one smoking-related CpG site, and these may drive their association with bladder cancer. However, for the other 7 DMRs (i.e., numbers 5 (Chr2), 11 (Chr 11), 13 (Chr 14), 15 (Chr 16), 17 (Chr 21), 18 (Chr 22) and 19 (Chr X)), the distance to the smoking-bladder cancer-related CpGs within the DMR is equal to or more than 2 orders away from smoking, suggesting more distal, potentially non-tobacco associated processes related to bladder cancer (Fig. 4b).

Prediction of bladder cancer

ROC analyses (Fig. 5) showed that, irrespective of the smoking metric, cg05575921 (AHHR) alone (AUC 0.60) yielded similar predictive performances than the classical questionnaire-based smoking metrics (AUC 0.62. 0.59, 0.62 for duration, intensity, and packyears, respectively). A model including the PC1 from the 200 CpG sites differentially methylated in relation to smoking status in our study outperformed all other models (AUC 0.62). The best prediction was achieved by including PC1 and the smoking metrics in the model, resulting in an AUC slightly higher than those with PC1 only (Range AUC 0.63 to 0.65).

Fig. 5figure 5

Receiver-Operating-Curve (ROC) analyses summarising the logistic model for smoking duration (a), smoking intensity (b), and pack-years (c), ROC curves are presented for the model including (i) the smoking metrics alone (green), (ii) the scores of the first principal component of the 28 CpG sites found differentially methylated in relation to smoking status (PC1 explaining 37.7% of the total variance, in blue), (iii) methylation levels at cg05575921 (AHRR), the CpG site exhibiting the strongest association with smoking status in our data (orange), (iv) methylation levels at cg05575921 and the smoking exposure metric (brown), and (v) PC1 scores and the smoking exposure measurement (dark red). We report the area under the curved from the testing set (20% of the total population) for each of the model investigated

留言 (0)

沒有登入
gif