Figure 1 shows the analysis procedure of this MR study. The GWAS summary statistics for (1) smoking-related traits, (2) DNA methylation quantitative trait locus, and (3) CVDs were downloaded from publicly accessible databases. Supplementary Table S1 provides comprehensive information on each phenotype along with their corresponding download locations. This study first assessed the causal effects of smoking behavior and smoking-related DNA methylation on CVDs through two-sample MR analysis. Subsequently, several key DNA methylation CpG sites were identified through colocalization analysis. Finally, enrichment analysis was used to explore the potential mechanisms of the key CpG sites. This MR study was conducted using publicly available data, which had been obtained with approval from the respective ethics committees of the original studies. Therefore, no additional ethics approval or informed consent was required for this study.
Fig. 1General flow of this MR study
GWAS summary statistics and IV selection for smoking traitsThe smoking-related traits used were derived from a recent large-scale GWAS meta-analysis [15]. A total of three smoking-related traits were included in the present MR study: (1) age of initiation of regular smoking (n = 341,427); (2) cigarettes per day (n = 337,334); and (3) smoking initiation (n = 1,232,091).
The age of initiation of regular smoking is a continuous phenotype that represents the age at which participants started smoking regularly. It could be measured in various ways, such as asking participants (i) “At what age did you begin smoking regularly?” or (ii) “How long have you smoked?” combined with “What is your current age?”.
Cigarettes per day is defined as the average number of cigarettes smoked per day, including both current and former smokers, regardless of whether the cigarettes are self-rolled or manufactured. The quantity of cigarettes per day is typically measured with a single question, such as “How many cigarettes do you smoke per day?” or “How many cigarettes did you smoke per day?” For studies collecting quantitative measures of cigarettes per day, responses were categorized into bins: 1 = 1–5 cigarettes per day, 2 = 6–15 cigarettes per day, 3 = 16–25 cigarettes per day, 4 = 26–35 cigarettes per day, and 5 = 36 or more cigarettes per day.
Smoking initiation was a binary phenotype, where any participant reporting regular smoking behavior (either currently or in the past) is considered as a case. This phenotype was measured through various methods, including asking the following questions to participants: (1) Do you smoke more than 100 cigarettes in your lifetime? (2) Have you ever smoked every day for at least one month consecutively? and (3) Do you smoke regularly?
IVs proxying for smoking-related traits were screened based on the following criteria: SNPs with significant associations at the genome-wide level (P < 5e-8) for three smoking-related traits were obtained from the supplementary material of the original study. Subsequently, SNPs in linkage disequilibrium (r2 > 0.001 within 10,000 kb) were further eliminated to ensure the independence between each IV. The strength of the IVs was evaluated using the Cragg–Donald F-statistic [16]. The F-statistic for a single IV is calculated using the formula: F-statistic = (n − 2)*R2/(1 − R2), where n represents the sample size for the corresponding IV, and R2 is the proportion of the exposure variance explained by the IV. This value of R2 can be obtained using the ‘add_rsq’ function from the ‘TwoSampleMR’ R package. IVs with an F-statistic greater than 10 were considered strong enough to avoid weak instrument bias and were included in the final MR analysis. As a result, 9 IVs representing the age of initiation of regular smoking, 38 IVs representing cigarettes per day, and 202 IVs representing smoking initiation were identified (Supplementary Table S2).
GWAS summary statistics and IV selection for smoking-related DNA methylationThe effects of smoking on DNA methylation were comprehensively assessed in an EWAS meta-analysis including 15,907 participants from 16 cohorts [17]. All included cohorts utilized the Infinium HumanMethylation 450 BeadChip, which contains 485,512 CpG sites, for methylation analysis of whole blood, CD4 + T cells, or monocytes. The study excluded CpG sites with less than three available cohorts and subsequently conducted a meta-analysis using a random effects model on the remaining 485,381 CpG sites. The association between smoking and CpG site DNA methylation levels was calculated after adjusting for covariates including gender, age, blood cell counts, and technical variables. Among the included participants, current smokers were defined as those who had smoked at least one cigarette per day in the 12 months prior to blood sampling; while, never smokers reported that they had never smoked. Eventually, this EWAS study identified a total of 2623 CpGs with significantly different DNA methylation levels between current cigarette smokers and never smokers [based on Bonferroni threshold P < 1e-7 (≈ 0.05/485,381)] (Supplementary Table S3).
Next, we aimed to identify cis-mQTLs that were significantly related to these 2623 smoking-related CpG sites, since these cis-mQTLs could be served as IVs proxying each smoking-related CpG sites to perform MR analysis. Specifically, data on 2623 smoking-related CpG sites were extracted from the summary statistics of DNA methylation quantitative trait loci (mQTL) analysis conducted by the Genetics of DNA Methylation Consortium (GoDMC) including 32,851 European participants [18]. Covariate adjustments were made for gender, measured age, batch variables, smoking, and recorded cell counts to reduce confounding effects and residual variation [18].
The criteria for selecting IVs (cis-mQTLs) that proxy the methylation levels of smoking-related CpG sites were as follows: summary statistics of cis-mQTLs (P < 1e-8, ± 1,000 from the corresponding CpG site) were extracted from the multiplicative random effects meta-analysis, and linkage disequilibrium pruning (r2 > 0.01) was performed. Finally, summary statistics were extracted for 1933 smoking-related CpG sites (including 5723 mQTLs), which were preliminarily included as IVs for subsequent MR analysis (Supplementary Table S4).
GWAS summary statistics for CVDsGWAS summary statistics for 9 types of CVDs were obtained for MR analysis. To increase the reliability of the analysis, GWAS summary statistics from two different cohorts were obtained for each CVD. For aortic aneurysm, data were collected from the UK Biobank, including 1374 cases and 400,595 controls, as well as from FinnGen, including 8125 cases and 381,977 controls. For atrial fibrillation, we obtained data from the investigation by Nielsen et al., which comprised 60,620 cases and 970,216 controls, and from FinnGen, which included 50,743 cases and 210,652 controls. For coronary atherosclerosis, data from the UK Biobank, with 20,023 cases and 377,103 controls, and from FinnGen, with 51,589 cases and 343,079 controls, were utilized. For coronary heart disease, we used data from the investigation by van der Harst et al., including 122,733 cases and 424,528 controls, as well as data from FinnGen, including 46,959 cases and 365,222 controls. For heart failure, data were obtained from the HERMES consortium, with 47,309 cases and 930,014 controls, and from FinnGen, with 29,672 cases and 382,509 controls. For intracerebral hemorrhage, we utilized data from the UK Biobank, with 700 cases and 399,017 controls, and from FinnGen, with 4056 cases and 371,717 controls. For ischemic stroke, data from the GIGASTROKE consortium, including 62,100 cases and 1,234,808 controls, and from FinnGen, including 16,857 cases and 283,057 controls, were included. For myocardial infarction, we included data from the investigation by Hartiala et al., with 61,505 cases and 577,716 controls, as well as data from FinnGen, with 26,060 cases and 343,079 controls. For subarachnoid hemorrhage, data were obtained from the UK Biobank, with 812 cases and 399,017 controls, and from FinnGen, with 3532 cases and 371,753 controls. Supplementary Table S1 shows the details and download addresses of the GWAS summary statistics of each cohort.
Assessing the causal effect of smoking traits on CVDsThe MR analysis in this study was performed with the “TwoSampleMR” and “MR-PRESSO” packages of R software. Supplementary Table S5 presents details of the IVs used in the MR analyses to assess the causal effects of the three smoking-related traits on the nine CVDs. Due to the large number of IVs proxying smoking-related traits, in order to reduce the interference of horizontal pleiotropy, SNPs potentially associated with CVD outcomes (P < 0.05) were excluded from the IVs. Inverse variance weighed was the primary causal inference method; while, MR-Egger and weighted median were used as supplementary methods [19]. Since CVDs were binary variables, MR results were presented using odds ratio (OR) and 95% confidence interval (CI). For each CVD, MR results from both cohorts were subjected to meta-analysis by the “meta” package, with a P-value < 0.05 indicating a causal estimation. For significant causal estimations, Cochran’s Q test was conducted to assess heterogeneity, along with the MR-egger intercept test and MR-PRESSO global test to assess horizontal pleiotropy, and a P-value > 0.05 indicated that the causal estimations were not influenced by heterogeneity or horizontal pleiotropy.
Assessing the causal effect of smoking-related CpG site DNA methylation levels on CVDsThe mQTL for smoking-associated CpG sites were subjected to allele harmonization and data merging with summary statistics for 9 CVDs, and mQTL with greater association with CVDs were excluded (P < 1e-8). Supplementary Table S6 presents detailed information on the IVs used for MR analysis to assess the causal effect of smoking-associated CpG sites on CVDs. For CpG sites with only one mQTL as an IV, causal estimation was performed using the Wald ratio approach, whereas for CpG sites with multiple IVs, causal estimation was performed using the IVW approach. For each smoking-related CpG site, a meta-analysis of the MR results of two diverse cohorts was calculated using the “meta” R package. After meta-analysis, multiple testing correction was performed within the MR results of each CVD using the false discovery rate (FDR) method. Smoking-related CpG sites with FDR < 0.05 were initially identified and the results were visualized by generating Manhattan plots using the “manhattan” R package [20].
Since the harmful effects of smoking have been revealed, further screening has identified CpG sites with consistent effects; they can be categorized as two types: (1) Smoking is significantly positively associated with DNA methylation levels at some CpG sites, and these methylation levels are also significantly positively associated with the risk of CVD; (2) Smoking is significantly negatively associated with DNA methylation levels at other CpG sites, and these methylation levels are also significantly negatively associated with the risk of CVD. Specifically, Venn diagrams were used to integrate the associations between smoking and CpG sites (from Joehanes et al. [17]) and between CpG sites and CVD (from the present MR study), resulting in the final identification of candidate smoking-related CpG sites.
Identification of key smoking-related CpG sites by co-localization analysisCo-localization analysis was performed on the candidate smoking-related CpG sites to identify key smoking-related CpG sites. Firstly, we performed a meta-analysis integration of GWAS summary statistics from two diverse cohorts for each CVD using the METAL software with the SCHEME STDERR option (https://csg.sph.umich.edu/abecasis/Metal/) [21]. Bayesian co-localization analysis of each candidate smoking CpG site and corresponding CVD was performed using the “coloc” R package with the “coloc.abf” function (default parameters: p1 = 1e-04, p2 = 1e-04, p12 = 1e-05). Co-localization analysis will generate posterior probabilities for five hypotheses: (1) H0: no association with either trait; (2) H1: association with trait 1, not with trait 2; (3) H2: association with trait 2, not with trait 1; (4) H3: association with trait 1 and trait 2 induced by two independent SNPs; and (5) H4: association with trait 1 and trait 2 induced by one shared SNP [22]. CpG sites with posterior probabilities of H4 greater than 90% were identified as key smoking-related CpG sites of the corresponding CVD. Results of co-location analysis were visualized using the “locuscomparer” R package.
Exploring the potential role of REST in four CVDs by reactome pathway enrichment analysisCo-localization analysis identified cg25313468 (located in the TSS1500 region of REST) as being simultaneously highly associated with the risk of four CVDs (atrial fibrillation, coronary atherosclerosis, coronary heart disease, and myocardial infarction). To explore the potential mechanisms, firstly, a protein–protein interaction (PPI) network of the top 500 genes highly interacting with REST was identified through the STRING database, which can identify interactions between genes at the protein level based on several aspects: automated text mining of scientific literature, computational predictions, interaction experiments databases, and curated sources [23]. Subsequently, the disease-associated genes of the four CVDs were obtained separately from the DisGeNET database [24], and their intersecting genes with the PPI top500 genes were identified by Venn diagrams. Finally, Reactome pathway enrichment analysis of the intersecting genes was performed via the DAVID online platform [25]. Terms with FDR < 0.05 were identified as significantly enriched pathways, and the top 10 significantly enriched pathways for each CVD were visualized by generating dot plots using the “ggplot2” R package.
留言 (0)