Principled distillation of multidimensional UK Biobank data reveals insights into the correlated human phenome

Abstract

Broad yet detailed data collected in biobanks captures variation reflective of human health and behavior, but insights are hard to extract given their complexity and scale. In the largest factor analysis to date, we distill hundreds of medical record codes, physical assays, and survey items from UK Biobank into 35 understandable latent constructs. The identified factors recapitulate known disease classifications, highlight the relevance of psychiatric constructs, improve measurement of health-related behavior, and disentangle elements of socioeconomic status. We demonstrate the power of this principled data reduction approach to clarify genetic signal, enhance discovery, and identify associations between underlying phenotypic structure and health outcomes such as mortality. We emphasize the importance of considering the interwoven nature of the human phenome when evaluating large-scale patterns relevant to public health.

Competing Interest Statement

BMN is a member of the scientific advisory board (SAB) at Deep Genomics and Neumora, consultant of the SAB for Camp4 Therapeutics, and consultant for Merck.

Funding Statement

CEC and EBR are supported by National Institute of Health (NIH) grant R01MH124851 and the Stanley Center for Psychiatric Research. RW received funding and support as an AnalytiXIN scholar from AnalytiXI Indiana. GDS works for the MRC Integrative Epidemiology Unit at the University of Bristol, which is supported by the Medical Research Council (MC_UU_00011/1). BMN is supported by NIH grant 5R37MH107649.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

This research was conducted by using the UK Biobank Resource under application 31063. The IRB of Partners HealthCare (Partners Human Research) determined in expedited review that the project met the US federal criteria definition of "not human subjects research."

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

All data, code, and summary statistics produced in the present study are available upon reasonable request to the authors. Access to individual level data from the UK Biobank can be obtained by bona fide scientists through application to UK Biobank (https://www.ukbiobank.ac.uk/enable-your-research). Summary statistics for item-level GWAS are available as part of the Neale Lab UKB Round 2 Mega-GWAS (http://www.nealelab.is/uk-biobank/ukbround2announcement).

http://www.nealelab.is/uk-biobank/ukbround2announcement

https://www.ukbiobank.ac.uk/enable-your-research

留言 (0)

沒有登入
gif