The overall research process was illustrated in Fig. 1A. This study adhered to the three major assumptions of MR and utilized the traditional two-sample analysis method, with viral infections as exposure and CRC and its subtypes as outcomes, to assess the genetic susceptibility association of viral infections on CRC risk. This study followed MR standardization reporting guidelines (Supplementary STROBE-MR checklist).
Fig. 1A: Research flowcharts. B: Preliminary analysis of risk effects
Viral GWAS sourceThe studied viruses included herpes simplex virus, hepatitis virus, rubella virus, measles virus, poliovirus, Epstein-Barr virus (EBV), human immunodeficiency virus (HIV), human papillomavirus (HPV16, HPV18), severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and mumps virus. Summary statistics for virus GWAS were obtained from 23andMe [14] and the r10 version of the FinnGen database [15].
The GWAS analysis from 23andMe was based on self-reported infection history questionnaires. Herpes simplex virus included 25,108 cases of cold sores and 63,332 controls from 23andMe [14], while the FinnGen dataset included 3,723 cases of herpes simplex virus infection and 396,378 controls, and 5,488 cases of herpes zoster infection and 396,378 controls [15]. Hepatitis infection data were obtained from FinnGen, including 2,320 cases and 409,861 controls. Poliovirus data were also from FinnGen, including 396 cases and 409,849 controls [15]. Rubella virus data were derived from 23andMe, with 12,000 cases and 71,597 controls [14], and from FinnGen, with 1,041 cases and 396,378 controls [15]. Measles virus data were obtained from 23andMe, with 38,219 cases and 47,279 controls [14], and from FinnGen, with 351 cases and 396,378 controls [15]. Mumps virus data were derived from 23andMe, with 31,227 cases and 68,446 controls [14], and from FinnGen, with 827 cases and 400,974 controls [15]. Infectious mononucleosis (EBV infection) data were obtained from 23andMe, with 17,457 cases and 68,446 controls [14], and from FinnGen, with 2,979 cases and 400,974 controls [15]. HPV infection data were sourced from the study by Shure et al., including 1,388 individuals of European ancestry [16]. COVID-19 data were obtained from the COVID-19 Host Genetics Initiative (HGI) r7 release (https://www.covid19hg.org/), including 13,769 cases and 1,072,442 controls [17]. Additionally, the FinnGen dataset included 2,856 confirmed COVID-19 cases and 405,232 controls [15].
Outcome GWAS sourcesCRC GWAS data were obtained from the study by Huyghe et al., including 11,835 cases of European ancestry and 11,856 controls [18]. Additionally, subgroup analyses for rectal cancer and colon cancer were conducted using large GWAS summary statistics from the Pan-UK Biobank (https://pan.ukbb.broadinstitute.org/), which included 3,856 colon cancer cases and 390,596 controls, and 2,705 rectal cancer cases and 386,740 controls. The generalized mixed model association test framework was used for multi-ancestry analysis of 7,228 phenotypes, including 16,131 GWAS, adjusted for age, sex, age*sex, age^2, age^2*sex, and the first 10 principal components [19].
Genetic factors that may influence infection and outcome riskTo further investigate the genetic factors that may increase the risk of viral infections and cancer, we analyzed 1,400 blood metabolites (GWAS ID: GCST90199621-GCST90201020). The GWAS for these metabolites adjusted for age, sex, time since last meal or drink, genotyping batch, and the top ten genetic principal components. Linear regression was performed on the metabolites and metabolite ratios [20]. Due to the inability to obtain sufficient SNPs at the significance threshold of 5 × 10− 8, we used a more lenient significance threshold of 1 × 10− 5. For all metabolite analysis results, only those where all MR directions remained consistent were included in the subsequent mediation analysis. We employed the traditional two-step method, the coefficient product test, to assess the overall effects of metabolites, viral infections, and outcomes.
Mendelian randomization analysisIn the MR analysis, we could not obtain enough instrumental variables (IVs) based on the significance threshold (P < 5 × 10− 8). Therefore, we relaxed the threshold to P < 5 × 10− 6, with genetic variants clustered at r2 < 0.001 within a 10,000 kb physical window. Additionally, IVs with an F-statistic less than 10 were filtered out to avoid weak instrument bias. The F-statistic was calculated as (beta/se)2 [21]. In the traditional MR analysis, inverse-variance weighted (IVW) was used as the primary method. Besides, we employed multiple MR methods to evaluate risk associations, including constrained maximum likelihood-based MR (cML), contamination mixture (ConMix), robust adjusted profile score (MR-RAPS), and debiased inverse-variance weighted method (dIVW) [22,23,24,25]. For single instrumental variables, the Wald ratio method was used to evaluate causal effects. Additionally, considering the directionality in MR analysis, we employed the Steiger method to ensure the correct analysis direction [26].
In the sensitivity analysis, Cochran’s Q statistic was used to assess heterogeneity, with a P-value ≤ 0.05 indicating the presence of heterogeneity [27]. We used MR-Egger and Mendelian Randomization Pleiotropy RESidual Sum and Outlier (MR-PRESSO) to evaluate the presence of horizontal pleiotropy [28, 29]. When horizontal pleiotropy was detected, MR-PRESSO was employed to remove outlier SNPs [29].
留言 (0)