Multiomic signatures of body mass index identify heterogeneous health phenotypes and responses to a lifestyle intervention

Arivale cohort

The main study cohort was derived from 6,223 individuals who participated in a wellness program offered by a currently closed commercial company (Arivale, Inc.) between 2015 and 2019. An individual was eligible for enrollment if the individual was over 18 years of age, not pregnant and a resident of any US state except New York; participants were primarily recruited from Washington, California and Oregon. The participants were not screened for any particular disease. During the Arivale program, each participant was provided personalized lifestyle coaching via telephone by registered dietitians, certified nutritionists or registered nurses. This coaching was designed to improve the participant’s health based on the combination of clinical laboratory tests, genetic predispositions and published scientific evidence; for example, reduction of sodium intake might be recommended to any participants with high blood pressure, but if they also had risk alleles indicating enhanced susceptibility to dietary sodium, this risk would be emphasized (see a previous report25 for more details).

In the current study, to compare the associations between BMI and host phenotypes across different omics, we limited the original cohort to the participants whose datasets contained (1) all main omic measurements (metabolomics, proteomics and clinical laboratory tests) from the same first blood draw; (2) a BMI measurement within ±1.5 months from the first blood draw; and (3) genetic information (for using as covariates). We also eliminated (1) outlier participants whose baseline BMI was beyond ±3 s.d. from the mean in the baseline BMI distribution and (2) participants whose any of omic datasets contained more than 10% missingness in the filtered analytes (see the ‘data cleaning’ subsection). The final Arivale cohort consisted of 1,277 (821 female and 456 male) participants (Fig. 1a) who exhibited consistent demographics (Extended Data Fig. 1a–c and Supplementary Data 1) with the study cohorts defined in the previous Arivale studies20,25,26,27,28,29. For the analyses of gut microbiome, subcohort was defined with the 702 (486 female and 216 male) participants from the Arivale cohort who collected a stool sample within ±1.5 months from the first blood draw and did not use antibiotics in the last 3 months (Fig. 4a and Supplementary Data 1). For longitudinal analyses, subcohort was defined with the 608 (410 female and 198 male) participants from the Arivale cohort whose datasets contained two or more time-series datasets for both BMI and omics during 18 months after enrollment (Fig. 5a and Supplementary Data 1). For the analyses of WHtR, subcohort was defined with the 1,078 (689 female and 389 male) participants from the Arivale cohort whose datasets contained the baseline WHtR measurement within ±1.5 months from the first blood draw and within ±3 s.d. from the mean in the baseline WHtR distribution (Extended Data Fig. 7a and Supplementary Data 1).

TwinsUK cohort

The external cohort was derived from 17,630 individuals who participated in the TwinsUK Registry, a British national register of adult twins31. Twins were recruited as volunteers by media campaigns without screening for any particular disease. The participants had two or more clinical visits for biological sampling between 1992 and 2022. In the current study, to validate our findings in the Arivale cohort, we limited the original cohort to the participants whose datasets contained all measurements for metabolomics32, BMI and the obesity-related standard clinical measures (that is, defined by triglycerides, HDL cholesterol, LDL cholesterol, glucose, insulin and HOMA-IR throughout the current study) from the same visit. We also eliminated (1) outlier participants whose BMI was beyond ±3 s.d. from the mean in the overall BMI distribution and (2) participants whose metabolomic dataset contained more than 10% missingness in the filtered metabolites (see the ‘data cleaning’ subsection). The final TwinsUK cohort consisted of 1,834 (1,774 female and 60 male) participants (Fig. 1a, Extended Data Fig. 1d–f and Supplementary Data 1). For the analyses of gut microbiome, subcohort was defined with the 329 (307 female and 22 male) participants from the TwinsUK cohort who collected a stool sample within ±1.5 months from the clinical visit and did not use antibiotics at that time (Fig. 4a and Supplementary Data 1).

Ethics statement

The current study was conducted with de-identified data of the participants who had consented to the use of their anonymized data in research. Procedures were run under the Western Institutional Review Board (study numbers 20170658 at the Institute for Systems Biology and 1178906 at Arivale). Application of data access for the TwinsUK cohort was approved by the TwinsUK Resource Executive Committee (project number E1192).

Data collections and data cleaning for the Arivale cohort

Multiomics data for the Arivale participants included genomics and longitudinal measurements of metabolomics, proteomics, clinical laboratory tests, gut microbiomes, wearable devices and health/lifestyle questionnaires. Peripheral venous blood draws for all measurements were performed by trained phlebotomists at LabCorp (Laboratory Corporation of America Holdings) or Quest (Quest Diagnostics) service centers. Saliva to measure analytes such as diurnal cortisol and dehydroepiandrosterone was sampled by participants at home using a standardized kit (ZRT Laboratory). Stool samples for gut microbiome measurements were obtained by participants at home using a standardized kit (DNA Genotek).

Genomics

DNA was extracted from each whole blood sample and underwent whole-genome sequencing (1,257 participants) or single-nucleotide polymorphism (SNP) microarray genotyping (20 participants). Genetic ancestry was calculated with principal components using a set of ~100,000 ancestry-informative SNP markers, as described previously25. PRSs were constructed using publicly available summary statistics from published genome-wide association studies, as described previously27.

Blood-measured omics

Metabolomics data were generated by Metabolon using ultra-high-performance liquid chromatography–tandem mass spectrometry (UHPLC–MS/MS) for plasma derived from each whole blood sample. Proteomics data were generated using proximity extension assay for plasma derived from each whole blood sample with several Olink target panels (Olink Proteomics), and only the measurements with the Cardiovascular II, Cardiovascular III and Inflammation panels were used in the current study because the other panels were not necessarily applied to all samples. All clinical laboratory tests were performed by LabCorp or Quest in a Clinical Laboratory Improvement Amendments-certified lab, and only the measurements by LabCorp were selected in the current study to eliminate potential differences between vendors. In the current study, the batch-corrected datasets with in-house pipeline were used, and the metabolomic dataset was loge-transformed. In addition, analytes missing in more than 10% of the baseline samples were removed from each omic dataset, and observations missing in more than 10% of the remaining analytes were further removed. The final filtered metabolomics, proteomics and clinical labs consisted of 766 metabolites, 274 proteins and 71 clinical laboratory tests, respectively (Supplementary Data 2).

Gut microbiome

Gut microbiome data were generated based on 16S amplicon sequencing of the V3+V4 region using a MiSeq sequencer (Illumina) for DNA extracted from each stool sample, as previously described28. In brief, the FASTQ files were processed using the mbtools workflow (version 0.37.1; https://github.com/Gibbons-Lab/mbtools) to remove noise, infer amplicon sequence variants (ASVs) and remove chimeras. Taxonomy assignment was performed using the SILVA ribosomal RNA gene database (version 132)54. In the current study, the final collapsed ASV table across the samples consisted of 394, 341, 85, 45, 26 and 16 taxa for species, genus, family, order, class and phylum, respectively. Gut microbiome α-diversity was calculated at the ASV level using Shannon’s index calculated by \(\nolimits_^S }} }\), where pi is the proportion of a community i represented by ASVs, or using Chao1 diversity score calculated by \(S_}}}} = S_}}}} + \over }\), where Sobs is the number of observed ASVs; n1 is the number of singletons (ASVs captured once); and n2 is the number of doubletons (ASVs captured twice).

Anthropometrics, saliva-measured analytes and daily physical activity measures

Anthropometrics, including weight, height, waist circumference and blood pressure, were measured at the time of blood draw and also reported by participants, which generated diverse timing and numbers of observations depending on each participant. BMI and WHtR were calculated from the measured anthropometrics with the weight divided by squared height (kg m−2) and the waist circumference divided by height (unit-less), respectively. Measurements of saliva samples were performed in the testing laboratory of ZRT Laboratory. Daily physical activity measures, such as heart rate, moving distance, step count, burned calories, floors climbed and sleep quality, were tracked using the Fitbit wearable device. To manage variations between days, monthly averaged data were used for these daily measures. In the current study, the baseline measurement for these longitudinal measures was defined with the closest observation to the first blood draw per participant and data type, and each dataset was eliminated from analyses when its baseline measurement was beyond ±1.5 months from the first blood draw.

Data collections and data cleaning for the TwinsUK cohort

Data resource for the TwinsUK participants included longitudinal measurements of metabolomics, clinical laboratory tests, DXA and health/lifestyle questionnaires31. The necessary datasets for the current study were provided by the Department of Twin Research & Genetic Epidemiology (King’s College London). In the current study, after each provided dataset was cleaned as follows, the earliest visit among the visits from which all of metabolomics, BMI and standard clinical measures had been measured was defined as the baseline visit for each participant. As an exception, the later visit among them was prioritized as the baseline visit if the participant had gut microbiome data within ±1.5 months from the visit. Only the baseline visit measurements were analyzed.

Blood-measured metabolomics

Metabolomics data were originally generated by Metabolon using UHPLC–MS/MS for each serum sample32. In the current study, the provided median-normalized dataset was loge-transformed. In addition, metabolites missing in more than 10% of the overall samples were removed from the metabolomic dataset, and observations missing in more than 10% of the remaining metabolites were further removed. The final filtered metabolomics consisted of 683 metabolites.

BMI

In the current study, the BMI values that had been already calculated and included in the provided metabolomics data file were used.

Standard clinical measures and other phenotypic measures

In the current study, because the provided phenotypic datasets contained multiple measurements for a phenotype even from a single visit of a participant (for example, owing to project difference or repeated measurements), multiple measurements were flattened into a single measurement for a phenotype per each participant’s visit by taking the mean value. During this flattening step, the difference in unit was properly adjusted, and the value indicating below detection limit was regarded as 0. HOMA-IR was calculated from the datasets of glucose, insulin and fasting condition with the formula: HOMA-IR = fasting glucose (mmol L−1) × fasting insulin (mIU L−1) × 22.5−1.

Gut microbiome

Gut microbiome data were originally generated based on WMGS using a HiSeq 2500 sequencer (Illumina) for DNA extracted from each stool sample43. In the current study, the raw sequencing data were obtained from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (PRJEB32731) and applied to a processing pipeline on Nextflow (version 22.04.5; https://github.com/Gibbons-Lab/pipelines). Through this pipeline, the obtained FASTQ files were processed using the fastp (version 0.23.2) tool55 to filter and trim the reads, and taxonomic abundance was obtained using the Kraken 2 (version 2.1.2) and Bracken (version 2.6.0) tools56 with the Kraken 2 default database (based on NCBI RefSeq). The final collapsed taxonomic table across the samples consisted of 4,669, 1,225, 354, 167, 76 and 35 taxa for species, genus, family, order, class and phylum, respectively.

Blood omics-based BMI and WHtR models

For each Arivale baseline omic dataset, missing values were first imputed with a random forest algorithm using the Python missingpy (version 0.2.0) library (corresponding to R MissForrest package57). For sex-stratified models (Extended Data Fig. 2d), the datasets after imputation were divided into sex-stratified datasets. Subsequently, the values in each omic dataset were standardized with z-score using the mean and s.d. per analyte. Then, ten iterations of LASSO modeling with ten-fold cross-validation (Fig. 1a and Extended Data Fig. 7a) were performed for the (unstandardized) loge-transformed BMI or WHtR and each processed omic dataset, using the LassoCV application programming interface (API) of the Python scikit-learn (version 1.0.1) library. Training and testing (hold-out) sets were generated by splitting participants into ten sets with one set as a testing (hold-out) set and the remaining nine sets as a training set and iterating all combinations over those ten sets; that is, overfitting was controlled using ten-fold iteration with ten testing (hold-out) sets, and hyperparameter was decided using ten-fold cross-validation with internal training and validation sets from each training set. Consequently, this procedure generated ten fitted sparse models for each omics category (Supplementary Data 3 and 8) and one single testing (hold-out) set-derived prediction from each omics category for each participant. The same modeling scheme while replacing LASSO with elastic net, ridge or random forest was performed using Python scikit-learn ElasticNetCV, RidgeCV or RandomForestRegressor-implemented GridSearchCV API, respectively. In this random forest modeling, the number of trees in the forest and the number of features were set as the hyperparameters to be decided through cross-validation. For the standard measures-based models, the above modeling scheme was applied to OLS linear regression with sex, age, triglycerides, HDL cholesterol, LDL cholesterol, glucose, insulin and HOMA-IR as regressors, using Python scikit-learn LinearRegression API. Of note, ten split sets were fixed among the omics categories and the modeling methods, and no significant difference in BMI, WHtR, sex, age and ancestry principal components 1–5 among those ten sets was confirmed, using Pearson’s χ2 test for categorical variables and ANOVA for numeric variables while adjusting multiple testing with the Benjamini–Hochberg method across the tested variables (Supplementary Data 1).

For the TwinsUK cohort, the metabolomic dataset was applied to the random forest imputation, and then each dataset of metabolomics and standard clinical measures was applied to z-score standardization as well as the Arivale datasets. Using the ten LASSO or OLS linear regression models that were fitted by the Arivale dataset, one single prediction was calculated from each processed dataset for each participant by taking the mean of ten predicted values. For metabolomics, the ten MetBMI models were generated again but restricting the input Arivale metabolomics to the common 489 metabolites in the Arivale and TwinsUK panels (Extended Data Fig. 3).

For the LASSO-modeling iteration analysis (Extended Data Figs. 2e–h and 7f–i), ten LASSO models were repeatedly generated with the above modeling scheme. At the end of each iteration, the variable that was retained across ten models and that had the highest absolute value for the mean of ten β-coefficients was removed from the input omic dataset.

For longitudinal predictions of the Arivale subcohort, one single prediction at a timepoint was calculated from each processed time-series omic dataset for each participant, using the baseline LASSO model for which the participant was included in the baseline testing (hold-out) set. This was because (1) the baseline measurements were minimally affected by the personalized lifestyle coaching; (2) both count and timepoint of data collections were different among the participants; and (3) potential data leakage might be derived from the relationships between the baseline and following measurements for the same participant. For processing, each time-series omic dataset was applied to two-step random forest imputation; that is, the baseline missingness was first imputed based on the baseline data structure, and the remaining missingness was next imputed based on the overall data structure. Each imputed dataset was subsequently applied to z-score standardization using the mean and s.d. in the baseline distribution.

Model performance was conservatively evaluated by the out-of-sample R2 that was calculated from each corresponding hold-out testing set in the Arivale cohort or from the external testing set in the TwinsUK cohort. Pearson’s r between the measured and predicted values was calculated from the overall participants of the Arivale or TwinsUK cohort. Difference of the predicted value from the measured value (ΔMeasure; that is, ΔBMI or ΔWHtR) was calculated with (the predicted value − the measured value) × (the measured value)−1 × 100 (that is, the unit of ΔMeasure was (% Measure)). In the random forest model, the importance of a feature was calculated as the normalized total reduction of the mean squared error that was brought by the feature.

Health classification

Each participant was classified using each of the measured and omics-inferred BMIs based on the WHO international standards for BMI cutoffs (underweight: <18.5 kg m−2, normal: 18.5–25 kg m−2, overweight: 25–30 kg m−2, obese: ≥30 kg m−2)12. For the misclassification of BMI class against the omics-inferred BMI class, each participant was categorized into either a matched or a mismatched group when the measured BMI class was matched or mismatched to each omics-inferred BMI class, respectively.

For a clinically defined metabolic health classification, the participants having two or more MetS risks of the National Cholesterol Education Program Adult Treatment Panel III guidelines were judged as the metabolically unhealthy group, whereas the other participants were judged as the metabolically healthy group34,35. Concretely, the MetS risk components were (1) systolic blood pressure ≥130 mm Hg, diastolic blood pressure ≥85 mm Hg or using anti-hypertensive medication; (2) fasting triglyceride level ≥150 mg dl−1; (3) fasting HDL cholesterol level <50 mg dl−1 for female and <40 mg dl−1 for male or using lipid-lowering medication; and (4) fasting glucose level ≥100 mg dl−1 or using anti-diabetic medication. Only the participants who had all these information were assessed in the corresponding analyses (Fig. 3b and Extended Data Figs. 6a and 7m).

Gut microbiome-based models for classifying obesity

For the Arivale gut microbiome dataset, the whole ASV table (907 taxa from species to phylum) was pre-processed (that is, positively shifted by 1, loge-transformed and standardized with z-score using the mean and s.d. per taxon) and then applied to dimensionality reduction using PCA API of the Python scikit-learn (version 1.0.1) library; the projected values onto the first 50 principal components (0.4–5.1% variance explained) were supplied as the input gut microbiome features. Two types of classifiers were trained on these gut microbiome features: one predicting whether an individual is obese BMI class and the other predicting whether an individual is obese MetBMI class. Both models were independently constructed through a five-fold iteration scheme of random forest with five-fold cross-validation (Fig. 4a) using Python scikit-learn RandomForestClassifier-implemented GridSearchCV API. In this random forest modeling, the number of trees in the forest and the number of features were set as the hyperparameters to be decided through cross-validation. Training and testing (hold-out) sets were generated by splitting the participants of the normal and obese classes into five sets, with one set as a testing (hold-out) set and the remaining four sets as a training set, and iterating all combinations over those five sets; that is, overfitting was controlled using five-fold iteration with five testing (hold-out) sets, and hyperparameters were decided using five-fold cross-validation with internal training and validation sets from each training set. Consequently, this procedure generated five fitted classifiers for each BMI or MetBMI class and one single testing (hold-out) set-derived prediction from each classifier type for each participant. Note that this prediction included two types: either normal or obese class by a vote of the trees (that is, binary prediction) and the mean probability of obese class among the trees.

For the TwinsUK gut microbiome dataset, the whole taxonomic table (6,526 taxa from species to phylum) was pre-processed and then applied to dimensionality reduction as well as the Arivale dataset; the projected values onto the first 50 principal components (0.2–40.1% variance explained) were supplied as the input gut microbiome features. Then, the five obesity classifiers for each BMI or MetBMI class were generated as well as the above Arivale procedure, and one single testing (hold-out) set-derived prediction from each classifier type was calculated for each participant (Fig. 4a).

Model performance of each classifier was conservatively evaluated using each corresponding hold-out testing set. AUC in the ROC curve and the average precision were calculated using the probability predictions, whereas sensitivity and specificity were calculated from the confusion matrix using the binary predictions. The overall ROC curve and its AUC were calculated from all the participants’ probability predictions, using the R pROC (version 1.18.0) package58.

Longitudinal changes in the measured and omics-inferred BMIs

An LMM was generated for each loge-transformed measured or omics-inferred BMI in the Arivale subcohort, following the previous approach25. As fixed effects regarding time, linear regression splines with knots at 0, 6, 12 and 18 months were applied to days in the program to fit time as a continuous variable rather than a categorical variable, because both count and timepoint of data collections were different among the participants. In addition to the linear regression splines of time as fixed effects, the LMM included sex, baseline age, ancestry principal components 1–5 and meteorological seasons as fixed effects (to adjust potential confounding effects) and random intercepts and random slopes of days in the program as random effects for each participant. Additionally, the same LMM for each measured or omics-inferred BMI was independently generated from each baseline BMI class-stratified group. Of note, this stratified LMM was not generated from the underweight group because its sample size was too small for convergence. For comparing difference among the misclassification strata against the baseline MetBMI class, the above LMM while adding additional fixed effects (the categorical baseline misclassification of BMI class against MetBMI class (that is, binary for the matched versus mismatched) and its interaction terms with the linear regression splines of time) was generated for each measured BMI or MetBMI from each baseline BMI class-stratified group. All LMMs were modeled using MixedLM API of the Python statsmodels (version 0.13.0) library.

Plasma analyte correlation network analysis

Before the analysis, outlier values that were beyond ±3 s.d. from the mean in the Arivale subcohort baseline distribution were eliminated from the dataset per analyte, and seven clinical laboratory tests, which became almost invariant across the participants, were eliminated from analyses, allowing convergence in the following modeling. Per each analyte, values were converted with a transformation pipeline producing the lowest skewness (for example, no transformation, the logarithm transformation for right-skewed distribution or the square root transformation with mirroring for left-skewed distribution) and standardized with z-score using the mean and s.d.

Against 608,856 pairwise combinations of the analytes (766 metabolites, 274 proteomics and 64 clinical laboratory tests), GLMs for the baseline measurements of the Arivale subcohort (Fig. 5a; 608 participants) were independently generated with the Gaussian distribution and identity link function using glm API of the Python statsmodels (version 0.13.0) library. Each GLM consisted of an analyte as a dependent variable, another analyte and the baseline MetBMI as independent variables (with their interaction term) and sex, baseline age and ancestry principal components 1–5 as covariates. The analyte–analyte correlation pair that was significantly modified by the baseline MetBMI was obtained based on the β-coefficient (two-sided t-test) of the interaction term between the independent variables in GLM while adjusting multiple testing with the Benjamini–Hochberg method (FDR < 0.05).

Against the significant 100 pairs from the GLM analysis (82 metabolites, 33 proteins and 16 clinical laboratory tests; Supplementary Data 7), GEEs for the longitudinal measurements of the metabolically obese group (that is, the baseline obese MetBMI class; 182 participants) were independently generated with the exchangeable covariance structure using Python statsmodels GEE API. Each GEE consisted of an analyte as a dependent variable, another analyte and days in the program as independent variables (with their interaction term) and sex, baseline age, ancestry principal components 1–5 and meteorological seasons as covariates. The analyte–analyte correlation pair that was significantly modified by days in the program was obtained based on the β-coefficient (two-sided t-test) of the interaction term between the independent variables in GEE while adjusting multiple testing with the Benjamini–Hochberg method (FDR < 0.05).

Statistical analysis

All data pre-processing and statistical analyses were performed using Python NumPy (version 1.18.1 or 1.21.3), pandas (version 1.0.3 or 1.3.4), SciPy (version 1.4.1 or 1.7.1) and statsmodels (version 0.11.1 or 0.13.0) libraries, except for using the R pROC (version 1.18.0) package58 for DeLong’s test59. All statistical tests were performed using a two-sided hypothesis. In all cases of multiple testing, P values were adjusted with the Benjamini–Hochberg method. Of note, because some hypotheses were not completely independent (for example, hypotheses between combined omics and each individual omics and hypotheses among glucose, insulin and HOMA-IR), this simple P value adjustment was regarded as a conservative approach. Significance was based on P < 0.05 for single testing and FDR < 0.05 for multiple testing. Test summaries (for example, sample size, degree of freedom, test statistic and exact P value) are found in Supplementary Data 46, 9 and 10.

Correlations (Figs. 1b and 3a and Extended Data Figs. 3b–d, 4b,f, 7c,d,l and 8d,e) were independently assessed using Pearson’s correlation test (Python SciPy pearsonr API) (with the P value adjustment if multiple testing). Comparisons of model performance (Figs. 1c,d and 4d,f and Extended Data Figs. 2d, 4a and 7e) were independently assessed using Welch’s t-test (Python statsmodels ttest_ind API) (with the P value adjustment if multiple testing). Comparison of overall ROC curves (Fig. 4c,e) was assessed using unpaired DeLong’s test59.

In all regression analyses, only the baseline datasets were used, and, unless otherwise specified, all numeric variables were centered and scaled in advance. For the Arivale datasets of anthropometrics, saliva-measured analytes, daily physical activity measures and PRSs, (1) outlier values that were beyond ±3 s.d. from the mean in the cohort distribution were eliminated from the dataset per variable; (2) variables that became almost invariant across the participants were eliminated from the datasets; (3) values were converted with a transformation pipeline producing the lowest skewness (for example, no transformation, the logarithm transformation for right-skewed distribution or the square root transformation with mirroring for left-skewed distribution); and (4) the transformed values were standardized with z-score using the mean and s.d.; these pre-processed 51 variables were used as the numeric physiological features (Supplementary Data 4). Likewise, the Arivale datasets of the obesity-related clinical blood markers (that is, selected clinical labs; Supplementary Data 6) and the TwinsUK datasets of the obesity-related phenotypic measures (Supplementary Data 6) were pre-processed. For gut microbiome α-diversity metrics, the number of observed ASVs and Chao1 index were converted with square root transformation, and Shannon’s index was converted with square transformation, and then these transformed values were standardized with z-score using the mean and s.d. Relationships of the numeric physiological features with the measured or omics-inferred BMI (Fig. 1e) were independently assessed using each OLS linear regression model with the (unstandardized) loge-transformed measured or omics-inferred BMI as a dependent variable, a feature as an independent variable and sex, age and ancestry principal components 1–5 as covariates while adjusting multiple testing across the 255 (51 features × 5 BMI types) regressions. Relationships between Measure (that is, BMI or WHtR) and the analytes that were retained in at least one of ten LASSO models (Fig. 2b–d and Extended Data Fig. 7k) were independently assessed using each OLS linear regression model with the (unstandardized) loge-transformed Measure as a dependent variable, an analyte as an independent variable and sex, age and ancestry principal component 1–5 as covariates while adjusting multiple testing across the 210 (Fig. 2b), 75 (Fig. 2c), 42 (Fig. 2d) or 289 (Extended Data Fig. 7k) regressions. In this regression analysis, a model including the omics-inferred Measure as an independent variable was also assessed as reference. Differences in ΔMeasure (that is, ΔBMI or ΔWHtR) between clinically defined metabolic health conditions (Fig. 3b and Extended Data Figs. 6a and 7m) were independently assessed using each OLS linear regression model with ΔMeasure as a dependent variable, metabolic condition (that is, healthy versus unhealthy) as a categorical independent variable and Measure, sex, age and ancestry principal components 1–5 as covariates while adjusting multiple testing across the eight (2 BMI classes × 4 omics categories; Fig. 3b and Extended Data Fig. 7m) or four (2 BMI classes × 2 cohorts; Extended Data Fig. 6a) regressions. Differences in the obesity-related clinical blood markers, the BMI-associated numeric physiological features or the gut microbiome α-diversity metrics between the misclassification strata against the omics-inferred BMI class (Figs. 3d,e and 4b and Extended Data Fig. 6c) were independently assessed using each OLS linear regression model with a marker, feature or metric as a dependent variable, misclassification (that is, matched versus mismatched) as a categorical independent variable and BMI, sex, age and ancestry principal components 1–5 as covariates while adjusting multiple testing across the 40 (2 BMI classes × 2 omics categories × 10 markers; Fig. 3d), 216 (2 BMI classes × 4 omics categories × 27 features; Fig. 3e), 24 (2 BMI classes × 4 omics categories × 3 metrics; Fig. 4b) or 24 (2 BMI classes × 12 measures; Extended Data Fig. 6c) regressions. In the above regression analyses for the TwinsUK cohort, ancestry principal components were eliminated from the covariates owing to data availability.

Data visualization

Results were visualized using Python matplotlib (version 3.4.3) and seaborn (version 0.11.2) libraries, except for the plasma analyte correlation network. Data were summarized as the mean with 95% confidence interval or the standard box plot (median: center line; 95% confidence interval around median: notch; [Q1, Q3]: box limits; [xmin, xmax]: whiskers, where Q1 and Q3 are the 1st and 3rd quartile values, and xmin and xmax are the minimum and maximum values in [Q1 − 1.5 × IQR, Q3 + 1.5 × IQR] (IQR, interquartile range, Q3 − Q1), respectively), as indicated in each figure legend. For presentation purposes, confidence interval was simultaneously calculated during visualization using Python seaborn barplot or boxplot API with default setting (1,000 times bootstrapping or a Gaussian-based asymptotic approximation, respectively). The OLS linear regression line with 95% confidence interval was simultaneously generated during visualization using Python seaborn regplot API with default setting (1,000 times bootstrapping). The plasma analyte correlation network was visualized with a circos plot using the R circlize (version 0.4.15) package60.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

留言 (0)

沒有登入
gif