Association between sulfur microbial diet and the risk of esophageal cancer: a prospective cohort study in 101,752 American adults

Study population

The sample population used in this study was derived from the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial, which was a large population-based randomized controlled study. This trial aimed to ascertain whether a few screening procedures, such as colonoscopy, chest X-ray, digital rectal examination, prostate-specific antigen (PSA), cancer antigen 125 (CA-125), etc., could reduce the risk of death from PLCO malignancies [16]. In this trial, subjects were selected to participate in between November 1993 and September 2001, finally, 154,887 participants aged between 55 and 74 years were registered [16]. The registered participants were then randomly divided into the control and screening groups. Participants in the control group received standard cares, whereas those in the screening group were subjected to above-mentioned specific screening tests. All participants were asked to answer questionnaires related to demographics, lifestyle, dietary habits, health behaviors, medication use, diagnoses of diseases, and other exposures of interest. The Supplemental Questionnaire (SQX), Baseline Questionnaire (BQ), and Diet History Questionnaire (DHQ) were some of the specific questionnaires that were used in this study.

Based on the objectives of our study, the following participants were excluded: (1) Participants who failed to submit the BQ (n = 4918); (2) Participants who submitted an invalid DHQ (n = 38,462) (Invalid DHQ refers to DHQs that missed the completion date, subjects were deceased before DHQ completion, containing 8 or more missing/multiple frequency responses on DHQ, and having extreme energy consumption on DHQ [participants within the first and last percentile by gender]); (3) Participants having a history of cancer (excluding the nonmelanoma skin cancer) before DHQ analysis entry (n = 9684); and (4) participants with outcome events (diagnoses of EC, death, loss of follow-up, and the end of follow-up) occurred before DHQ completion (n = 71). The whole baseline sample included 101,752 individuals (Fig. 1). Written informed permission was required from every participant. The National Cancer Institute approved the experimental procedures and protocols used in this study (Project ID: PLCO-1134).

Fig. 1figure 1

The flow chart of identifying eligible subjects. PLCO: Prostate, Lung, Colorectal, and Ovarian; BQ: Baseline Questionnaire; DHQ: Diet History Questionnaire

Assessment of sulfur microbial diet scores

The DHQ, a self-administered, 124-item food frequency questionnaire, was used to collect dietary data in this study. The DHQ was designed to assess the serving sizes and frequency of food consumed for the 12 months prior to registration, its validity and dependability have already been described [17]. Participants’ consumption of each food was calculated by multiplying the serving sizes and food frequency.

SMD scores were calculated to quantify the participants’ adherence to SMD following an established method [11]. The food component groups in SMD included liquor, processed meat, low-calorie drinks, beer, fruit juice, legumes, other vegetables (including corn, eggplant, mixed vegetables, mushrooms, celery, green pepper, and summer squash), and sweets/desserts. Each food group was ranked into quartiles and was given positive or negative scores. For example, in the case of liquor, processed meat, and low-calorie drinks, participants below the lowest quartile, received a score of 1, whereas those above the highest quartile of a food group received a score of 4. As for beer, fruit juice, legumes, other vegetables, and whole grains, the scoring pattern was reversed (Supplementary Table 1). The SMD score was computed by summing the scores of the above-mentioned 8 food groups, and the total score of every participant ranged from 8 to 32. Higher SMD scores reflected greater adherence to the SMD pattern.

Assessment of the covariates

In this study, we evaluated the individual and population level risks associated with 15 potentially modifiable risk factors. The self-reported BQ was used to gather information regarding the demographic and lifestyle characteristics, including sex, race, marital status, body mass index (BMI), educational level, smoking status, regular consumption of aspirin, history of diabetes, history of hypertension, pack-years of smoking, and family history of EC. The terms “white” and “non-white” were used to represent the race, the marital status was described as “married or living as married” and “other”, and the educational level of the participants were described as “college below”, “college graduate”, and “postgraduate”. BMI was computed by dividing the weight (kg) by height (m2). Other risk factors that were evaluated using the aforementioned DHQ included age at DHQ completion, energy consumption from the diet, intake of pickled vegetables and fruits. The SQX was used to collect data on physical activity levels, which were calculated as the weekly sum of self-reported minutes of moderate to vigorous activity.

Outcome ascertainment

To gather data regarding newly diagnosed EC cases, the date of diagnosis, and more detailed relative information, participants were emailed a self-reporting annual study update form. Researchers checked the participants’ medical records after obtaining the consent of them to validate the diagnosis. Family reports and death certificates were used as supplemental information to identify deaths. It should be noted that the diagnosis of EC (ICD-O-2 codes: C150-C15, C153-155, and C158-C159) was the primary outcome of this study. The diagnoses of esophageal squamous cell carcinoma (ESCC) and esophageal adenocarcinoma (EA) were the secondary endpoints (ICD-O-2 codes: ESCC: 8070, and 8071; EA: 8140, 8480, and 8481).

Statistical analysis

This study includes a few variates with missing data. For variables containing < 5% missing values, categorical covariates such as marital status, educational level, aspirin consumption, history of diabetes and hypertension, family history of EC, and smoking status were imputed by the modal value; while the continuous covariates namely BMI and pack-years of smoking were imputed by the median value. In our analysis, the “physical activity levels” variable with > 25% missing values was assumed to be missing at random and implement using the Multivariate Imputation by Chained Equations (MICE) approach [18], which is a robust method of data imputation. In this method, missing data were replaced by iteratively drawing from the fitted conditional distributions of partially observed variables, given the observed and imputed values of the remaining variables in the imputation model. In this study, 25 imputed datasets were created for physical activity. Considering the heterogeneity of the imputed results, we took the mean value of the 25 imputed results for physical activity and applied it as a covariate in the final analysis. Supplementary Tables 2 and Supplementary Table 3 presents additional details regarding the relevant data imputation.

To calculate the hazard ratios (HRs) and 95% confidence intervals (CIs) of the EC, ESCC and EA incidence related to SMD scores, the Cox proportional hazards regression models were utilized. The follow-up period spanned from the DHQ completion date to the date of EC diagnosis, death, loss of follow-up, or the completion of follow-up (December 31, 2009), whichever occurred first (Fig. 2). Person-years were calculated by summing up the follow-up time of each participant and were used as the time variable. For all analyses, the SMD scores were divided into quartiles, and the first quartile (Q1) was set as the reference group. To estimate the linear trends of the association, the median scores of SMD in every quartile was assigned to every participant in this quartile, and the scores were regarded as continuous variables to conduct Cox regression analyses and acquire the P-value for trend. To adjust for potential relevant confounders, two multivariate Cox proportional hazards regression models were used. Particularly, Model 1 adjusted for a variety of demographic variables, including sex, age, race, educational level, and marital status. Potential effect moderating covariates, such as BMI, regular consumption of aspirin, levels of physical activity, history of hypertension, history of diabetes, energy intake from diet, smoking status, pack-years of smoking, and consumption of pickled vegetables/fruit, were further adjusted in Model 2. The multivariable-adjusted HRs and 95% CIs associated with a 1-point increment in SMD score were also estimated. In this study, we employed a restricted cubic spline (RCS) model with three strategically placed knots at the 10th, 50th, and 90th percentiles to investigate the nonlinear association between the SMD score and the risk of EC and its subtypes after adjusting for confounders. The structure of the RCS model, with its first knot dictating linear behavior at the lower exposure range, internal knots allowing the model to introduce nonlinearity within these intervals, and the final knot ensuring linear behavior at the upper range, was designed to effectively captures the complexity of the dose-response relationship. The “second spline” coefficient, representing the spline function’s coefficient after the first knot, is key to identifying the nonlinearity between the variable and the response.

Fig. 2figure 2

The timeline and follow-up scheme of our study

Prespecified subgroup analyses were conducted to assess the probable effect of the interaction factors, such as sex (male or female), age (> 65 or ≤ 65 years), BMI (> 30 kg/m2 or ≤ 30 kg/m2), regular consumption of aspirin (yes or no), smoking status (never or current/former), history of diabetes (yes or no), history of hypertension (yes or no), dietary energy intake (> medium or ≤ medium), and physical activity levels (> medium or ≤ medium). We used the likelihood ratio test as a mean to obtain P value in assessing the effect of adding the interaction term into a multivariate linear mixed-effects model. To test the robustness of the findings, some sensitivity analyses were carried out as follows: (1) Firstly, the participants with family history of EC were excluded; (2) Secondly, the participants in the control group were excluded; (3) Thirdly, participants with extreme BMI (defined as top 1% and bottom 1%) were excluded; (4) Additionally, participants with unbelievable energy intake (> 4000 kcal/day or < 500 kcal/day) [19] was excluded; (5) Physical activity were excluded from covariates to help better rule out the impact of such high missingness; (6) Finally, cases observed within the first 1 years of follow-up were exclude to test the potential inverse causation.

Cox regression analyses for each of these SMD components in relation to the risk of EC were further conducted to ascertain the principal contributing components to this relationship.

All data were statistically analyzed using the R (ver. 4.2.1) software. A two-tailed P-value < 0.05 indicated the significance level.

留言 (0)

沒有登入
gif