Association between inflammatory bowel disease and cancer risk: evidence triangulation from genetic correlation, Mendelian randomization, and colocalization analyses across East Asian and European populations

Study design

Our study was based on summary-level GWAS data available for East Asian and European populations to explore the potential causal associations between IBD (including UC and CD) and the risk of cancers. In the East Asian population, eight site-specific cancers were included in the analysis: colorectal, esophageal, stomach, liver cell, cervical, prostate, lung, and breast cancers. In the European population, 27 site-specific cancers were selected, including oropharynx, esophageal, stomach, small bowel, colorectal, anus, liver, bile duct, liver cell, pancreatic, Hodgkin lymphoma, non-Hodgkin lymphoma, leukemia, multiple myeloma, skin melanoma, nonmelanoma skin, squamous cell, kidney, bladder, prostate, cervical, corpus uteri, ovarian, lung, breast, thyroid, and brain cancers. Genome-wide LDSC was used to assess the genetic association between IBD and cancer. A standard two-sample MR analysis was performed to clarify the causal relationship between IBD and cancer. Colocalization analysis was used to investigate the local genetic structure shared between IBD and cancer and to assess whether the causal association was due to chance. Figure 1 provides an overview of the design and process of our analysis.

Fig. 1figure 1

An overview of the study design and process. LD, linkage disequilibrium; MAF, minor allele frequency; MR, Mendelian randomization

Our study followed the STROBE-MR guidelines [22]. The STROBE-MR checklist is available in Additional file 1: Table S1.

Data sourcesGWAS data for IBD

The GWAS summary-level data for IBD/UC/CD patients based on East Asian and European populations were released in the published study [23, 32]. The meta-GWAS for the East Asian population included 14,393 patients with IBD and 15,456 controls, 7372 patients with CD and 14,946 controls, and 6862 patients with UC and 15,456 controls [23]. The meta-GWAS for the European population included 25,042 patients with IBD and 34,915 healthy controls, 12,194 with CD and 28,072 healthy controls, and 12,366 with UC and 33,609 healthy controls [32]. More detailed information can be found in Additional file 1: Table S2.

GWAS data for cancers

GWAS summary statistics of eight cancers for the East Asian population were obtained from a large-scale GWAS conducted by Kazuyoshi Ishigaki et al. [33] from the Biobank Japan. The samples were collected from 12 medical institutions across Japan and included approximately 200,000 participants. The sample size of the GWAS data for cancers ranged from 90,336 to 212,453, and the number of cases ranged from 605 to 7062. More details of the GWAS data for cancers are listed in Additional file 1: Table S2.

With respect to the European population, we used data from the latest available GWAS data, which had the largest sample size or the largest sample size of patients for the outcome under investigation. The GWAS summary statistics for 27 cancers were obtained mainly from the following sources: (i) Rashkin SR et al. conducted GWAS across 18 types of cancer within two population-based cohorts: the UK Biobank and the Kaiser Permanente Genetic Epidemiology Research on Adult Health and Aging cohorts [34]; (ii) Jiang L et al. utilized fastGWA-GLMM to the UK Biobank data and subsequently procured full summary statistics [35]; (iii) A study conducted by Kimberley Burrows et al. presented detailed information on the GWAS focusing on pan-cancer and site-specific cancers among participants from the UK Biobank [36]; (iv) Meta-analysis with Transdisciplinary Research of Cancer in Lung of the International Lung Cancer Consortium and Lung Cancer Cohort Consortium, performed by McKay JD et al. [37]; and (v) Seviiri M et al. executed a multitrait genetic analysis of more than 300,000 participants from Europe, Australia, and the USA [38]. The sample size of the GWAS data ranged from 85,716 to 456,348, and the number of cases ranged from 104 to 29,266. More details of the GWAS data for cancers are listed in Additional file 1: Table S2.

Data analysisLinkage disequilibrium score regression

Genome-wide LDSC [30] was used to assess the genetic association between IBD and cancer (https://github.com/bulik/ldsc). The LDSC calculates genetic correlation by considering the impact of all single nucleotide polymorphisms (SNPs), even those that do not achieve genome-wide significance. We removed SNPs that did not merge with HapMap3 SNPs and those with a minor allele frequency less than 0.01. The findings are presented as genetic correlation (rg) with standard error (SE). The results of LDSC analysis could not be available if either one or both traits exhibited too low heritability [39, 40].

P values less than 0.05 were considered suggestive of evidence for a potential genetic correlation. Statistical analysis was performed using LDSC v1.0.1.

Mendelian randomization analysis

MR analysis is an instrumental variable analysis that uses genetic variants as instrumental variables (IVs) to study causality. Additional file 2: Fig. S1 provides an overview of our MR design. MR analysis is based on three main assumptions: (i) genetic instruments are associated with exposure, (ii) genetic instruments are independent of any confounder, and (iii) genetic instruments affect outcome only through exposure.

Conditionally uncorrelated variants strongly (P < 5 × 10−8) and independently (linkage disequilibrium [LD] r2 < 0.001, window size = 10,000 kb) associated with IBD/UC/CD were extracted as IVs. The LD proxies were defined using 1000 genomes from East Asian and European samples. We calculated the overall R2 and F-statistics by summing the estimated R2 [R2 = 2 × EAF × (1-EAF) × beta2] and F-statistics [F = beta2/se2] for each SNP. The F-statistics for all traits under consideration exceeded 10 [41], indicating no potential weak instrument bias. In addition, the mRnd website tool (https://shiny.cnsgenomics.com/mRnd/) was used to calculate the statistical power of the MR analysis.

We selected the random-effects inverse-variance weighted (IVW) method as the primary analysis, and sensitivity analyses, including the weighted median (WM), penalized weighted median (PWM), MR-Egger, MR pleiotropy residual sum and outlier (PRESSO), and MR-robust adjusted profile score (RAPS) analyses, were performed to further explore the stability of the results. In addition, the intercepts of the MR-Egger analysis and MR-PRESSO global test were calculated to evaluate pleiotropy. When the global test P values in the MR-PRESSO analysis were less than 0.05, the MR-PRESSO estimates were the results after outlier removal. We conducted a Steiger directionality test to rule out potential reverse causality.

In addition, to further avoid potential pleiotropy, we scanned PhenoScanner (on February 6th, 2024; http://www.phenoscanner.medschl.cam.ac.uk) for identifying traits associated with instrumental variables (R2 ≥ 0.8, P values ≤ 5 × 10−8), and performed MR after removing SNPs associated with confounding factors (body mass index, waist circumference, hip circumference, waist-hip ratio, percentage of body fat, smoking, alcohol consumption, insomnia, depression, and physical activity). For significant MR findings, we also conducted multivariable MR (MVMR) analysis to obtain estimates independent of these confounding factors.

Pleiotropy poses a challenge to interpreting MR results. Therefore, we reported the primary IVW results, combined with methods for detecting and correcting for pleiotropy, to fully account for the bias from pleiotropy.

P values less than 0.05 were considered suggestive of evidence for a potential causal association. The IVW method, sensitivity analyses (excluding MR-RAPS), Steiger directionality test, and MVMR were implemented using the “TwoSampleMR” (version 0.5.7) package in R version 4.3.1. The MR-RAPS analysis was performed using the “mr.raps” (version 0.2) package in R version 4.3.1.

Bayesian colocalization analysis

We used this method to assess whether two associated traits were consistent with shared causal variant(s) according to the included IVs. Five mutually exclusive hypotheses were tested: (1) there is no causal genetic variant for either trait (H0); (2) there is one causal genetic variant for trait 1 only (H1); (3) there is one causal genetic variant for trait 2 only (H2); (4) there are two distinct causal genetic variants, one for each trait (H3); and (5) there is a causal genetic variant for both traits (H4). The posterior probability (PP) is used to quantify the support of each hypothesis and is expressed as PPH0, PPH1, PPH2, PPH3, and PPH4 [31]. We selected regions that had 500 kb windows upstream and downstream of each instrumental variable in MR for analysis, and the average value of PPH4 across all regions was taken as the final colocalization result.

A PPH4 level greater than 75% was considered suggestive of evidence for a causal genetic variant for both traits. These PPs were calculated using the “coloc (version 5.2.3)” package in R version 4.3.1.

Possible results and explanations

As shown in Fig. 2, we summarized ten possible results and nine explanations combining the results from genetic correlation, MR, and colocalization analyses based on the effects and levels of statistical significance/direction. The results from genetic correlation and MR analyses performed both statistically significant and direct, while colocalization analysis had only statistically significant. The results of MR comprehensively considered the primary analysis of the IVW method and excluded the potential bias of pleiotropy.

Fig. 2figure 2

Summarize possible results and explanations. P values less than 0.05 in the LDSC and MR analyses were considered suggestive of evidence for a potential association, and a PPH4 level greater than 75% was considered suggestive of evidence for a causal genetic variant for both traits. Co*, colocalization; LDSC, linkage disequilibrium score regression; MR, Mendelian randomization

The specific results and explanations used were as follows: (Explanation i) when all three results were significant and in the same direction, it was interpreted as strong genetic evidence for the causal association; (Explanation ii) when all three results were significant and in the opposite direction, it was interpreted that the genetic evidence remains controversial; (Explanation iii) when the results from genetic correlation and MR analyses were significant and in the same direction, it was interpreted as the causal association without shared causal genetic variants; (Explanation iv) when the results from genetic correlation and MR analyses were significant and in the opposite direction or when only the result from MR analysis was significant, it might be a false positive causal association; (Explanation v) when the results from genetic correlation and colocalization analyses were significant, it might be a false negative causal association; (Explanation vi) when only the result from LDSC analysis was significant, it was interpreted as pleiotropy without shared causal genetic variants; (Explanation vii) when the results from MR and colocalization analyses were significant, it was interpreted as weak genetic evidence for causal association; (Explanation viii) when only the result from colocalization analysis was significant, it was interpreted as no enough causality but with shared causal genetic variants; (Explanation ix) when all three results were insignificant, it was interpreted as no genetic evidence for the causal association.

Risk of bias assessment

To assess the quality of the MR studies, we considered 8 potential biases: (1) weak instrument bias, (2) pleiotropy bias, (3) bias from sample overlap, (4) bias from crowd stratification, (5) bias from inconsistency with sensitivity analyses, (6) bias from lack of repeatability, (7) bias from inconsistency with other study design evidence, and (8) reporting bias. Each domain was judged as having a low, moderate (no information was classified as moderate bias), or high risk of bias. The detailed risk bias assessment criteria used in the Mendelian randomization studies can be found in Additional file 1: Table S3.

留言 (0)

沒有登入
gif