Towards objective measurements of habitual dietary intake patterns: comparing NMR metabolomics and food frequency questionnaire data in a population-based cohort

Study design and study participants

The Northern Sweden Health and Disease Study, NSHDS, is a biobank with questionnaire data and blood samples from several population-based cohort studies in northern Sweden. The largest cohort is the Västerbotten Intervention Programme, VIP, which started in 1984. The program includes an invitation of all inhabitants in the county of Västerbotten to their regular health care center the year they turn 40, 50 or 60 years of age. For a few years, also 30-years old subjects were invited. Annual participation rate up until today has varied between 50 and 80% of the eligible population. To date, about 60% of the adult population of Västerbotten have participated at least once and an earlier evaluation concluded that there are no indications of systematic bias with respect to socio-demographic characteristics between participants and non-participants [11].

During the health visit, participants complete a questionnaire on lifestyle factors, donate blood samples for research and clinical measurements are collected. Questionnaire data and blood samples are kept by the Unit for Biobank Research, Umeå, Sweden (EBF, https://www.umu.se/en/biobank-research-unit/). VIP is described in detail in Norberg et al. [12].

For the current project, a subsample of 2,000 women and men were selected for detailed evaluation with Nuclear Magnetic Resonance (NMR) untargeted metabolomics. The time window was restricted to the years 2000–2016, because previous research [13] had indicated changes in dietary patterns over time and hence earlier years were excluded. Among visits made by women and men aged > 30 and ≤ 65 years, only those with stored unthawed blood samples and complete questionnaire information on diet, body mass index, smoking and education were considered for sample selection. From this pool, a stratified random sample of 1,000 unique women and 1,000 unique men balanced by 10-year age strata was drawn. Metabolomics analyses were incomplete for five individuals and thus 1,995 individuals were available for further analyses. Outliers with respect to BMI (< 19.0 and > 35.0 kg/m2) and fasting plasma glucose levels (> 8.0 mmol/l) exerted strong impact on metabolomics models. Hence, these individuals were removed and the sample size for the final analyses was 1,895 with participation in VIP between the years 2000–2016.

Metabolomics analyses

Fasting blood samples were stored at -80 °C until analysis, and prepared according to In Vitro Diagnostics Research (IVDr; Bruker BioSpin, Rheinstetten, Germany) standard operating procedures [14]. Daily quality assurance included ensuring that sample temperature (calibration on 99.8% methanol-d4), shimming quality and water suppression (2mM sucrose sample in 10% D2O) and quantification reference (certified sample containing five metabolites of known concentration) were within specifications. Prior to 1H NMR analyses, previously unthawed plasma samples were thawed for 30 min at room temperature and thereafter centrifuged at 3,500x g for 1 min at 4 °C. Next, 100 µL plasma was mixed with 100 µL NMR buffer (75 mM Na2HPO4, 20% v/v D2O, 0.08% TSP-d4, 0.04% NaN3, pH 7.4) in a deep well plate (Porvair, cat no 53.219030), with the aid of a SamplePro Tube L liquid handler (Bruker BioSpin). The plate was shaken at 400 r/min at 12 °C for 5 min in a Thermomixer Comfort (Eppendorf). Then, an 180 µL aliquote was transferred to 3 mm SampleJet NMR tubes using the SamplePro L; all sample tubes, the deep well plate and the SampleJet rack were kept at 2 °C until analyses.

All 1H NMR spectra were measured on a Bruker 600 MHz Avance III HD spectrometer equipped with a room temperature 5 mm BBI probe and a cooled (6 °C) SampleJet automatic sample changer for sample handling. Here, 1D NOESY (´noesygppr1d´pulse sequence) was used for peak selection and metabolite quantification and 1D CPMG (´cpmgpr1d´) and 2D J-resolved (´jresgpprqf´) spectra, obtained according to the standard IVDr parameter settings at 310 K, were used for manual identification of peaks. TSP-d4 was used for referencing.

Sodium phosphate (Na2HPO4) and sodium azide (NaN3) were bought from SigmaAldrich, deuterium oxide (D2O) from CortecNet, and 3-(trimethylsilyl) propionic-2,2,3,3-d4 acid sodium salt (TSP-d4) from MerckMillipore. Data were aligned and peaks were selected by R using ’speaq 2´ [15]. Poor water suppression in several samples influenced the spectra around 4.7 ppm; therefore spectra between 4.2 and 5.2 ppm were not included in the current analyses. However, this did not influence the model quality in any material way. In total 230 peaks between chemical shifts − 0.236–8.096 ppm were included. Annotation of discriminating metabolites selected from the multivariate models was done by Chenomx NMR suite 8.31 (Chenomx Inc.) with the aid of the Human Metabolome Database [16] and an in-house implementation of the STOCSY routine [17].

Dietary assessment

Participants in VIP filled in a semi-quantitative food frequency questionnaire (FFQ) that consists of 64 questions on common food items and dishes and reflects habitual intake during the last year. Portion sizes were indicated on four pictures with varying portion sizes for meat/fish, staple food and vegetables. Frequency of intake of the food items was indicated on a nine-grade scale from never to ≥ 4 times/day. Frequency of intake was converted to grams per day using the indicated portion sizes as well as natural sizes (e.g., fruit) or either age or gender-specific portion sizes. Daily energy and nutrient intakes were calculated by linking the food intake data to the national food composition database at the Swedish Food Agency (https://soknaringsinnehall.livsmedelsverket.se/). All dietary data in NSHDS are curated as Northern Sweden Diet Database, NSDD.

Originally an 84-item FFQ was designed. This version was validated against ten repeated 24-hour recalls and plasma β-carotene in 246 study participants [18]. Participants also repeated the FFQ twice, one year apart. The results indicated good correlations in energy and nutrient intake between the two occasions and the FFQ was deemed to be of similar quality as that of other prospective cohort studies using FFQ as a method to measure food intake [19]. Further, reported intake of several fatty acids has been validated against 24-hour recalls and fatty acid profile of erythrocyte membranes [20], and reported intake of phytosterols [21]. Later, several similar food groups were collapsed into larger groups, resulting in a 64-item FFQ. This version has been validated against biomarkers for reported intake of B vitamins [22].

For the current analyses, only individuals with reported dietary intake of acceptable quality were included. Inclusions were based on having < 10% missing answers on the FFQ, and food intake level (reported energy intake/calculated basal metabolic rate) within 1–99% of the range for each sex in the entire VIP cohort.

Construction of a priori and data-driven diet scores and indices

Diet intake patterns have been described for all participants in NSDD previously, using a priori scores and indices as well as a posteriori data-driven clustering, and these were used in the present analyses. A Healthy Diet Score (HDS) was calculated as previously described [23]. The score is based on intake of eight food and beverage groups. Favorable groups include fish, fruit (except juice), vegetables (except potatoes) and whole grain. Unfavorable groups include red and processed meat, desserts and sweets, sugar-sweetened beverages and fried potatoes. Within each sex, intakes are ranked in ascending quartile ranks for favorable groups and in descending quartile ranks for unfavorable groups. The sum of the quartile ranks yields the score, with a maximum of 24 and higher scores reflecting a healthier diet.

A relative Mediterranean Diet Score (rMDS) was calculated as described by Buckland et al. [24]. The score indicates adherence to a Mediterranean style diet and is based on intake of nine components. Tertiles of intake, expressed as g*1000/kcal*day, were calculated for vegetables excluding potatoes; fruit including nuts and seeds; legumes, fresh and frozen fish excluding fish products and preserved fish, olive oil and cereals. The tertiles were assigned values of 0–2. For total meat and dairy products, similar tertiles were constructed and the scoring was reversed to account for a putative negative effect on health. Alcohol was scored 2 for moderate consumption and 0 for consumption outside of this range. The final score had a maximum of 9, indicating high adherence to a healthy Mediterranean-style diet.

A plant-based diet index, PDI, was developed as described by Satija et al. [25]. Foods were combined into 15 homogeneous groups (healthful plant foods: whole grains, fruits, vegetables, legumes, vegetable oils, coffee/tea; unhealthful plant foods: sweetened beverages, refined grains, potato, sweets/desserts; and animal foods: animal fat, dairy, fish/seafood, poultry/red meat, and miscellaneous animal-based foods). Within each sex, quintiles of frequency of intake/day were constructed. For PDI, participants were assigned 5 points if they were above their fifth quintile of intake of any plant food, 4 points if between the fifth and fourth quintile of intake and so forth down to 1 point if below the first quintile of intake. For animal foods the reverse scoring was used, i.e., participants were assigned 1 point if above their fifth quintile of intake etc. Points for all 15 food groups were summarized to the PDI. Further, a healthful plant diet index was constructed, hPDI. Here, only healthful plant foods were included in the positive ranking (i.e., 5 points if above highest quintile etc.) whereas both unhealthful plant foods and animal foods were included in the reverse ranking (i.e., 1 point if above the highest quintile, etc.). Lastly, an unhealthful plant diet index was constructed, uPDI. Here, unhealthful plant foods were included in the positive ranking whereas healthful plant foods and animal foods were included in the reverse ranking. For all three indices, minimum and maximum values ranged 15 and 75.

Finally, latent class analyses have been applied to NSDD to identify distinct, mutually exclusive latent clusters of habitual diet [13]. Female and male NSDD participants between 2000 and 2007 and 2008–2016 were modelled separately. The reason for the two time periods was indications that dietary intake patterns had changed in Sweden over the years and hence homogeneous patterns over the entire time span were not expected. In the LCA analyses, individuals are predicted to mutually exclusive groups where within-class variance is minimized and between-class variance is maximized. Reported intake per 1,000 kcal of 40 food groups was used as input data. For all four subgroups, four clusters of food consumption were identified as the optimal class solution based on the Bayesian information criteria (BIC), the LL statistics, class size and pattern interpretability. These clusters captured variations in intake of healthy foods such as fruit and vegetables, high-fiber bread and low-fat milk, and less healthy foods such as high-fat dairy, white bread, sugar, jam and cookies. Clusters from Period 1 (years 2000–2007) have been used in the present analyses because too few participants of the current sample were represented in Period 2 for analyses to be meaningful. Broad description of categorizations as well as intake patterns for the indices, scores and clusters are presented in Supplementary Tables S1 and S2.

Assessment of non-dietary variables

Anthropometric and socio-demographic data were collected at the participants’ nearest health care center [12]. Height in cm and weight in kg were measured in light clothing, without shoes. Body mass index (BMI) was calculated as weight in kg/height in m2. Basal metabolic rate was estimated according to the Schofield equation [26]. Physical activity was measured by combining two questions about occupational and leisure time physical activity into the validated Cambridge Index of Physical Activity [27]. Participants were categorized into inactive, moderately inactive, moderately active and active. Information on smoking was categorized into current smoker; former smoker; and never smoker. Educational level was categorized as basic level of 9 years of schooling; high school; and university.

A 5-minute rest preceded the measurements of systolic and diastolic blood pressures. Blood glucose levels were evaluated with the use of a benchtop analyzer after at least 4 h of fasting. Serum cholesterol and triglycerides had been analyzed in a Reflotron benchtop analyzer at the health care centers (in the earlier years) or using an enzymatic routine method at the nearest hospital (from September 1st 2009). Details of the methods are found in Norberg et al. 2010 [12].

Statistical analyses

Descriptive results for the study sample are presented using mean and standard deviations or medians and quartiles as well as Spearman correlation coefficients. Continuous variables were adjusted for age. These analyses were performed in IBM SPSS Statistics version 28 (IBM Corp.).

All metabolomics multivariate analyses were performed in SIMCA software v.17.0 (Sartorius Stedim Biotech) with data unit variance-scaled and cross validation groups set to 7 (default). Principal component analysis (PCA) was used to explore clustering patterns of observations and outliers. Orthogonal projections to latent structures (OPLS) include not only x-values (metabolite variables i.e. peaks) but also dependent y-values, e.g., additional known factors that may influence models. Included y-values tested in an OPLS-model were participant characteristics such as BMI, age, sex, education, smoking, physical activity, and year of data collection. To select y-values, a cut-off in the cross-validation analysis of variance (CV-ANOVA) of p < 0.05 was applied. OPLS models with HDS, rMDS, PDI, hPDI, uPDI and clusters included one at a time as y-value were evaluated to explore clustering patterns of observations for each of these scores/indices/clusters. If significant models were achieved, the models were further explored by including also participant characteristics as y-values. Lastly, OPLS with discriminant analysis (OPLS-DA) was performed for OPLS models that remained significant both with and without the additional y-values included. Here, lowest quartile (Q1) was compared with highest quartile (Q4) of the score/index. The validity of the OPLS-DA model was assessed using permutation tests (n = 999). Validated prediction models for performance are presented using the receiver operating characteristic (ROC) curve for OPLS-DA models. Also, to further test model quality, a test set (∼10% of participants) was selected by computerized randomization before any OPLS-DA analysis were performed. OPLS-DA models were run without the test set participants and this was thereafter used to test the models’ ability to predict high or low dietary quality. The cumulative amount of explained variation in the data summarized by the model (R2X[cum] and R2Y[cum]) and the predictive ability of the model (Q2[cum]) are presented. Class discriminating variables (buckets) of interest from OPLS and OPLS-DA models were selected if variables had loading scores − 0.1 ≥ w ≥ 0.1 and if they had among the 30 highest variable influence on projections values to obtain a reasonable number of models, and these were further assessed by univariate analysis. Mann–Whitney U-test was performed to evaluate metabolites driving the separation in OPLS-DA models. To adjust for multiple testing in univariate analysis a False Discorey Rate (FDR) correction was applied; q values < 0.05 were regarded as significant.

留言 (0)

沒有登入
gif