Ultra-processed foods, adiposity and risk of head and neck cancer and oesophageal adenocarcinoma in the European Prospective Investigation into Cancer and Nutrition study: a mediation analysis

The EPIC cohort

The EPIC study has been fully described elsewhere [24,25,26]. Briefly, EPIC is one of the largest prospective cohort studies in Europe. It recruited 521,323 participants between 1992 and 2000. Participants were enrolled in 23 centres across 10 European countries, namely Denmark, France, Germany, Greece, Italy, the Netherlands, Norway, Spain, Sweden and the United Kingdom. Most were 35–69 years old at recruitment [24, 25]. They were either volunteers from the general population, blood donors, employees of local companies, teachers/school employees or individuals enrolled in local ongoing studies. All participants provided written informed consent before completing the dietary and lifestyle questionnaires. Anthropometric and blood pressure data were also obtained at baseline. EPIC was approved by the International Agency for Research on Cancer (IARC) Ethics Committee and the local ethical review boards of all EPIC centres.

Study sample

Participants who withdrew consent from the study were not included in this research. We excluded participants diagnosed with cancer before enrolment (n = 25,184) and those with a length of follow-up equal to zero (n = 4148). We also excluded participants who did not complete the dietary or lifestyle questionnaires (n = 6259). We additionally excluded participants with extreme energy intake versus energy requirement ratios (top and bottom 1%) (n = 9573) and participants recruited in Greece due to administrative issues (n = 26,048). After exclusions, 450,111 participants were included in the analyses (Supplementary Fig. 1).

Dietary data and food processing variables

Semi-quantitative food frequency questionnaires (FFQs), extensive quantitative dietary questionnaires, and combined methods (i.e. semi-quantitative FFQs combined with 7-day records in the UK, and a non-quantitative FFQ combined with a 14-day record on hot meals in Malmö, Sweden) were used to obtain dietary data at baseline [25]. These were centre specific to account for local dietary habits and were either self-administered or administered in-person by trained interviewers. Furthermore, a standardised 24-h recall was used to obtain supplementary dietary data for a subsample of EPIC participants to calibrate baseline dietary measurements across EPIC centres [25, 27,28,29,30]. The dietary questionnaires and their mode of administration were described in detail in previous publications [25, 30].

The NOVA classification was used to categorise foods into four groups according to their extent and purpose of industrial processing [31]. Unprocessed/minimally processed foods (NOVA 1) are natural foods that may have undergone minimal processing for their preservation, storage, safety, or edibility. Processed culinary ingredients (NOVA 2) correspond to substances derived from unprocessed/minimally processed foods (e.g. oil, butter) or nature (e.g. salt) that are normally consumed in combination with unprocessed/minimally processed foods. Both processed foods (NOVA 3) and UPFs (NOVA 4) are industrial products. The former typically contain two or three common ingredients (i.e. a combination of unprocessed/minimally processed foods and processed culinary ingredients), while the latter contain many ingredients (most of which are rarely used in kitchens) and additives that make the final product tastier and more attractive to consumers.

Food preparations made (at home or elsewhere) using traditional methods were decomposed using standardised recipes. Individual food items were then classified according to their degree of processing. Food items were combined into broader food categories for simplicity. Of a total of 67 food categories in the dietary questionnaires, 19 were classified as unprocessed/minimally processed foods, 5 as culinary ingredients, 13 as processed foods and 30 as UPFs (see Supplementary Table 1 for details).

Here, we used the relative intake of each NOVA group in grams per day (%g/d). We also used the absolute intake in grams per day (g/d) and the absolute and relative intake in kilocalories per day in sensitivity analyses (kcal/d and %kcal/d, respectively).

Ascertainment of cancer cases

Incident cancer cases were identified through population-based cancer registries in Denmark, Italy (except Naples), the Netherlands, Norway, Spain, Sweden and the United Kingdom. Participants in other centres (France, Germany, Greece and Naples) were actively followed up using health insurance records, pathology registries and direct contact with participants or their next of kin.

HNC and OAC were defined using the 2nd and 3rd Revision of the International Classification of Diseases for Oncology (ICDO-2 and ICDO-3). According to the INHANCE consortium [32], HNC cases include malignant neoplasms of the oral cavity (topography codes C00.3–C00.6, C00.8–C00.9, C02.0–C02.3, C03.0–C03.1, C03.9, C04.0–C04.1, C04.8–C04.9, C05.0, C06.0–C06.2, C06.8–C06.9), oropharynx (C01.9, C02.4, C05.1–C05.2, C09.0–C09.1, C09.8–C09.9, C10.0–C10.4, C10.8–C10.9), hypopharynx (C12.9–C13.2, C13.8–C13.9), larynx (C32.0–C32.3, C32.8–C32.9), and oral cavity and pharynx unspecified/overlapping regions (C02.8–C02.9, C05.8–C05.9, C14.0, C14.2, C14.8). We did not exclude any histological subtypes of HNC. Oesophageal cancer cases correspond to topography codes C15.0–C15.5 and C15.8–C15.9. Among these, OAC cases were identified with codes 8140/3, 8144/3, 8480/3, 8481/3 and 8490/3. Other oesophageal cancer subtypes (e.g. squamous cell carcinoma and small cell carcinoma) were not investigated as outcomes in this study.

Covariates

Data on age at recruitment, sub-centre (22 centres in total, split into 27 sub-centres as follows: Northeast of France, Northwest of France, South of France, South coast of France, Florence, Varese, Ragusa, Turin, Naples, Asturias, Granada, Murcia, Navarra, San Sebastian, Cambridge, Oxford health-conscious population, Oxford general population, Bilthoven, Utrecht, Heidelberg, Potsdam, Malmö, Umeå, Aarhus, Copenhagen, Southeast of Norway, Northwest of Norway), sex (male/female), education level (none, primary, technical/professional, secondary, further education), physical activity based on the Cambridge Physical Activity Index [33] (inactive, moderately inactive, moderately active, active), measured/self-reported height (continuous in cm) and smoking status (never, former, current, unknown) were obtained at baseline through anthropometric measurements and lifestyle questionnaires. Additionally, data on alcohol intake (continuous in g/d) were acquired using dietary questionnaires.

Potential mediators

BMI and WHR were investigated as potential mediators in mediation analyses. BMI (continuous in kg/m2) was calculated from measured height and weight (measured using comparable, standardised methods) [34]. WHR (continuous) was estimated from measured waist and hip circumferences. Waist circumference was measured midway between the iliac crest and the lower ribs or at the narrowest torso circumference. Hip circumference was measured over the buttocks or at the widest point. EPIC-Oxford health-conscious population self-reported data were also used to estimate BMI and WHR, after the application of measurement error corrections [34, 35].

Statistical analysisDescriptive characteristics

The participants’ baseline characteristics were divided into sex-specific quartiles of relative UPF consumption (in %g/d). Mean and SD estimates were obtained for continuous variables, while frequencies and percentages were obtained for binary/categorical variables. Furthermore, we made a histogram to graphically represent the distribution of UPF consumption (in %g/d) in the EPIC cohort.

Data imputation

We used single-value imputation to deal with missing data in the covariates used to control for potential confounding (i.e. height, physical activity, education level and smoking status). When measured/self-reported height values were not available, missing values were imputed with mean centre-, age- and sex-specific height values [34]. Mode imputation was used for baseline binary and categorical covariates missing less than 5% of their values (i.e. education level: “primary school completed”, physical activity: “moderately inactive”, smoking status: “never”). Multiple imputation was used in sensitivity analyses (details in the “sensitivity analyses” subsection below).

Main association analysis

Cox proportional hazards models with age as the underlying timescale were used to investigate the association between the intake of UPFs and the risk of HNC and OAC. We estimated HRs and 95% CIs per 10% g/d higher consumption of UPFs. Time of entry was defined as age at recruitment, while time of exit was defined as age at first cancer diagnosis (excluding non-melanoma skin cancer) or age at last follow-up (i.e. death, emigration, loss to follow-up or end of follow-up [i.e. between June 2008 and December 2013, depending on the centre]), whichever came first. Model 1 was stratified by age at recruitment in 1-year categories, sex and sub-centre. Model 2 was additionally adjusted for education, physical activity, height and smoking status. Model 3 was additionally adjusted for alcohol intake in g/d to reflect the association between the consumption of UPFs and cancer, regardless of alcohol intake (a well-known cancer risk factor [36,37,38,39,40,41,42,43,44] that forms part of some processed foods and UPFs).

We graphically assessed the proportional hazards assumption using log–log survival plots. Additionally, we tested proportionality using Schoenfeld residuals. We also used correlation matrices and variance inflation factors to assess the presence of multicollinearity. Non-linearity was assessed using likelihood ratio tests comparing UPF consumption (in %g/d) modelled with and without natural cubic splines.

We undertook additional analyses to investigate the associations between the consumption of UPFs and the risk of HNC subtypes (i.e. oral cavity, oropharynx, hypopharynx, larynx, and oral cavity and pharynx unspecified/overlapping cancers). Heterogeneity tests were used to assess differences between HNC subtype estimates.

Furthermore, we stratified Model 3 (for every exposure–outcome combination) by alcohol intake (as defined by Wozniak et al. [45], i.e. no/light alcohol intake [0.1–6 g/d (men); 0.1–3 g/d (women)], moderate alcohol intake [6.1–24 g/d (men); 3.1–24 g/d (women)], heavy alcohol intake [> 24 g/d]), sex (i.e. male, female), physical activity (i.e. inactive, moderately inactive, moderately active, active), smoking status (i.e. never smoker, former smoker, current smoker) and education level (i.e. primary school or less, secondary or technical/professional school, higher education) and performed likelihood ratio tests to explore interactions. Models were not adjusted for the stratification variable.

Mediation analysis

Under the strong assumption that there is no residual confounding or measurement error in our study, we conducted a mediation analysis using the counterfactual framework [46] to further explore the mediating role of BMI and WHR in the associations between UPF consumption and the risk of HNC and OAC (Fig. 1).

Fig. 1figure 1

Mediation analysis diagram of the counterfactual two-way decomposition of the total effect of UPF consumption on the risk of head and neck cancer and oesophageal adenocarcinoma. All mediation models accounted for potential exposure–mediator interactions and were adjusted for age at recruitment in 1-year categories, sex, sub-centre, education, physical activity, height, smoking status and alcohol intake. The total effect (TE) corresponds to the sum of the pure natural direct effect (PNDE) and the total natural indirect effect (TNIE). Point estimates were obtained by direct counterfactual imputation estimation and confidence intervals were obtained using 1000 bootstrap repetitions. Abbreviations: BMI, body mass index; WHR, waist-to-hip ratio; UPF, ultra-processed food; HNC, head and neck cancer, OAC, oesophageal adenocarcinoma

In exploratory analyses, we ran linear regressions to study the associations between UPF consumption (i.e. the exposure) and both WHR and BMI (i.e. the potential mediators). We also ran exposure-adjusted Cox regressions to analyse the associations between the potential mediators and the risk of both HNC and OAC (i.e. the outcomes). Where there was evidence of an association between the potential mediator and both the exposure and the outcome, we used the “cmest” function in the “CMAverse” R package [47] to decompose the Total Effect (TE) of UPF consumption on the corresponding upper-aerodigestive tract cancer into a Pure Natural Direct Effect (PNDE) and a Total Natural Indirect Effect (TNIE) (on the ratio scale TE = PNDE × TNIE). The proportion mediated was also calculated (i.e. 100 × (PNDE × (TNIE – 1))/(TE − 1)) for each exposure–mediator–outcome combination [48]. All mediation models accounted for potential exposure–mediator interactions and were adjusted for age at recruitment in 1-year categories, sex, sub-centre, education, physical activity, height, smoking status and alcohol intake. Point estimates were obtained by direct counterfactual imputation estimation and 95% CIs were obtained using 1000 bootstrap repetitions. The results were scaled to reflect a 10% g/d higher consumption of UPFs.

Sensitivity analyses

As a sensitivity analysis, we explored adjusting our Cox models for total water intake (i.e. water content from foods and drinks, in addition to drinking water and water used as an ingredient in preparations). This was to see whether differences in water content across NOVA groups may influence the associations between the relative intake of UPFs and the risk of HNC and OAC. Similarly, we explored adjustments for total energy intake.

We also reran our Cox models after excluding participants who were censored during the first two years of follow-up to avoid reverse causation due to undiagnosed cancer at recruitment.

Additionally, we repeated the analyses using the absolute intake of UPFs in grams per day (g/d) and the absolute and relative intake in kilocalories per day (kcal/d and %kcal/d, respectively) as the exposure.

Moreover, we conducted a complete case analysis excluding participants with missing data for at least one lifestyle covariate (i.e. smoking status, physical activity and education level). In addition, we used the ‘mice’ R package to perform multivariate imputation by chained equations (MICE) [49], whereby smoking status, physical activity and education level were imputed five times by predictive mean matching. We fit our models using the MICE imputed data sets and then pooled the results according to Rubin’s rules [50] to obtain average HR estimates and standard errors for each model. For the complete case analysis and the MICE analysis, we still used centre-, age- and sex-specific imputed height as a covariate, as this is standard practice when dealing with anthropometric variables as confounders in EPIC [34].

Finally, we performed a negative control outcome analysis (i.e. where the outcome is not plausibly linked to the exposure of interest) to help identify any residual confounding that could be biasing our results [51]. We considered accidental deaths as the outcome (instead of upper-aerodigestive tract cancers) since the consumption of foods by their degree of processing is unlikely associated with the risk of being involved in a deadly accident (e.g. falls, transport accidents, accidental drowning). Any evidence of an association between UPF consumption and accidental deaths would suggest that our main results may be biased by the same factors that biased the negative control outcome results. Accidental deaths were defined as deaths due to events linked to codes V01–X59 in the 10th Revision of the International Classification of Diseases (ICD-10). For the negative control analysis, time of exit was defined as age at the time of death, emigration, loss to follow-up or end of follow-up, whichever came first. Participants were not censored at the time of cancer diagnosis, whereas they were in all other analyses in this study. The accidental death models accounted for the same covariates as the main analysis. BMI and type 2 diabetes mellitus would not normally be adjusted for in this analysis, as they are potential mediators and adjusting for them could induce collider bias (i.e. open backdoor paths from UPF consumption to accidental deaths through unobserved factors) [52]. Here, we did this in an explorative manner, assuming the absence of unobserved confounders of BMI, type 2 diabetes mellitus and accidental deaths.

Statistical software

All statistical analyses and visualisations were performed using R version 4.2.3. We used version 3.2.10 of the “survival” R package for the Cox regressions and version 0.1.0 of the “CMAverse” R package [47] for the mediation analysis. We also used version 0.1.0 of the “ggforestplot” R package to create forest plots. To create tables, we used version 1.3.0 of the “tidyverse” R package and version 0.7.0 of the “flextable” R package. P-values for heterogeneity between HNC subtype estimates were obtained using version 4.18–0 of the “meta” R package. Non-linearity was assessed using version 4.2.3 of the “splines” R package. MICE was performed using version 3.16 of the “mice” R package. Two-sided p-values < 0.05 were considered statistically significant.

留言 (0)

沒有登入
gif