The study population is from the UK Biobank (UKB) database, which was initiated between 2006 and 2010 and recruited over 500,000 individuals at baseline, who were subsequently followed up [18]. At assessment centers, cognitive tests, a wide range of phenotypic, health-related, and other data were collected. Data on disease outcomes were obtained from hospital inpatient admissions, primary care records, and electronic health care records. Additionally, blood samples were collected for genetic analysis. Written consent was obtained from participants. The National Research Ethics Service Committee North West Multi-Centre Haydock gave ethical approval (MREC, https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/about-us/ethics). The current analyses were performed under UKB application number 19,542.
Cognitive function assessmentsDuring the baseline assessment, participants underwent four computer-administer cognitive tests. These bespoke tests were designed to assess cognitive functions across different domains and provide valuable insights into aging and pathology within a large population. The brief and bespoke 4 cognitive assessments, including fluid intelligence test (FI), pairs matching test (PAM), reaction time test (RT), and prospective memory test (PM), carried out in the UKB correlated moderately-to-strongly with well-established, standard cognitive tests, showing the reliability [19]. Out of the 164,621 individuals who completed all four cognitive tests, 158 participants diagnosed with dementia were excluded, resulting in a total of 164,463 individuals included in our analysis.
Fluid intelligence test (FI)—a test that evaluates reasoning and problem-solving abilities, encompassing both fluid and crystal intelligence. It serves as a representative measure of general intelligence within this battery. Scores range from 0 to 13, with higher numbers indicating better performance. The outcome is assessed based on the log-transformed total number of correct answers.
Pairs matching test (PAM)—a test focuses on episodic memory. The test displays 6 pairs of matching symbol cards for 5 s in a random pattern, and requires individuals to identify as many pairs as possible with cards faced down. The outcome evaluation is based on the log-transformed total number of errors made by individuals who completed the test. Higher numbers indicate poorer performance.
Reaction time test (RT)—a test assesses processing speed. Participants are asked to press a button as soon as they see two identical cards in each of the 12 rounds. The outcome evaluation is based on the log-transformed mean reaction time for correct responses. Higher numbers indicate slower performance.
Prospective memory test (PM)—a test focuses on event-based prospective memory. Before the test battery, participants were informed to touch an Orange Circle, when they were shown four coloured shapes and asked to touch the Blue Square at the end of the cognitive tests. The outcome was incorrect response on the first attempt.
Cognitive decline was assessed in a subset of 16,547 to 50,287 individuals who participate in a second follow-up and were re-evaluated using the same cognitive tests during the period around 2014. The mean follow-up period was 9.36 years (SD = 2.11, range = 3.17 to 16.01 years). Cognitive decline was operationally defined as a deterioration in their FI, PAM, PM test results, or a reaction time delayed of at least 100 milliseconds compared to baseline measurements [20].
Risk factorsVariables from UKB with more than 20% missing values at baseline were excluded, and data from the methodology section, of non-environmental factors, and with one level were discarded. And for variables with collinearity |r| >0.9, we retained the one that is more important for cognitive functions, easier to interpret, or had a higher degree of accuracy. Finally, 364 variables were obtained, of which 258 were dichotomized and 106 were treated as continuations removing extreme values and transformed into z-scores. Medical disease data defined by International Classification of Diseases (ICD) have been combined appropriately. All variables were subdivided into the following categories: (1) Education, (2) Socioeconomic status (SES), (3) Leisure activity, (4) Body measurement index, (5) Mental health, (6) Diet, (7) Sleep, (8) Physical activities, (9) Smoke, (10) Alcohol, (11) Sexual factors, (12) Early life factors, (13) Household, (14) Sun exposure, (15) Medical conditions, (16) Medical disease, (17) Medical examination, (18) Environments. The field IDs in UKB and detailed information of variables was supplied in Supplement 1 eFig. 1 and Supplement 2 eTable 1.
Dementia and AD incidenceThe dementia diagnoses were determined using the corresponding three-character ICD codes (F00-F03, G30), obtained from UKB health outcome datasets, which included the first instances of health outcomes (Category 1712, encompassing hospital records, death registrations and primary care data) and algorithmically defined outcomes (Category 42). Additionally, the diagnoses for AD were based on the ICD codes (F00, G30). We selected incident dementia cases that occurred after a three-year baseline assessment until September 2023 to minimize reverse causality.
Statistical analysesThe statistical analyses were conducted using R version 4.0.4, involving three main steps. Firstly, a comprehensive exposome-wide analysis was performed. We used linear regression to test the associations of the variables with three cognitive domains (fluid intelligence, pairs matching, and reaction time) and used logistic regression to explore the associations with prospective memory. At first, we randomly divided the data into the discovery dataset and the validation dataset [12]. We conducted univariate analysis to identify variables which showed significant association with cognitive function in both the discovery and validation datasets. The Bonferroni corrected P value (P < 1.37 × 10 − 4) was employed for univariate analysis to rigorously control for false positives during initial screening stage [12]. For these identified variables, multivariate analysis was performed to further explore the association with cognitive function, in which P values after false discovery rate (FDR) correction less than 0.05 were deemed statistically significant, thereby reducing false positives while balancing the risk of false negatives. All the above analyses were performed for four cognitive tests by adjusting for age, gender, and APOE ε4. Besides, we performed the above analyses in subgroups stratified by age (≥ 60 years or < 60 years), gender (female or male), and APOE ε4 carrier status (carriers or non-carriers), SES (annual average total household income < £18,000 or ≥£18,000), education (college degree or not); and to eliminate potential confounding effects due to the collinearity of certain factors with comorbidities, we conducted the analyses in a subgroup of healthy individuals who were free of diabetes, cardiovascular (coronary heart disease, hypertension, disorders of lipoprotein metabolism, heart failure) and cerebrovascular (stroke) conditions, and chronic obstructive pulmonary disease.
Moreover, to reduce the impact of multicollinearity on the results, the LASSO [21], ridge regression analysis [22], and the PCA [21] were conducted with the adjustment of age, gender, and APOE ε4, which could mitigate overfitting arising from collinearity and complexity among variables. Varimax orthogonal rotation method was performed for PCA. And the scree plot is used to determine the number of principal components (PCs) to keep with cumulative variance contribution rate > 85%. Also, sensitivity analyses were performed by (1) additionally adjusting for race and different assessment centers, (2) additionally adjusting for above chronic diseases to control for collinearity of certain factors and avoid an over selection, and (3) by imputing the missing data with random forest approach using the “missRanger” package [23], which further validate robustness of the findings.
Furthermore, for the variables significantly associated with cognitive function in multivariable model, we explored the non-linear relationships between continuous variables and cognition using restricted cubic splines with four knots [24]; And the longitudinal association of the variables with cognitive decline were investigated using logistic regression models. All above analyses adjusted for age, gender, and APOE ε4. Additionally, we conducted a sensitivity analysis by adjusting for varying follow-up durations to further validate the association with cognitive decline.
Secondly, we conducted MR analyses to further examine the genetic associations. One-sample MR analyses were utilized to investigate the potential links between the significant variables identified in the EWAS analyses and cognitive function. The MRlap method was used in the analyses to address the potential bias arising from sample overlap. This method, which has been recently developed and proven to be robust, was successful in generating estimates using corrected effects [25, 26]. The summary statistics of both exposures and cognitive function tests were obtained from a genome-wide association study (GWAS) of population from UKB (http://www.nealelab.is/uk-biobank), with available detailed protocols (https://github.com/Nealelab/UK_Biobank_GWAS). SNPs classified as low confidence variant were excluded from the analysis. For the one-sample MR analysis, we rigorously selected a P-value threshold of 5.0 × 10 − 8 and a linkage disequilibrium (LD) clumping cut-off of 0.001 for the genetic instruments to minimize false positives. Subsequently, variables identified in one-sample analyses were subjected to further verification using two-sample MR analyses. For the two-sample MR, we used external GWAS data of dementia as outcome from FinnGen study (https://r8.finngen.fi/pheno/F5_DEMENTIA), and we opted for a more relaxed P-value threshold of 5 × 10 − 6 and a LD of 0.01 to enhance statistical power and capture a broader spectrum of genetic effects [27]. The inverse-variance weighted (IVW) method, in conjunction with weighted median and MR-Egger was utilized to produce the odds ratios (OR) and 95% CI. Potential heterogeneity and horizontal pleiotropy were assessed by IVW Cochran’s Q test, the Egger intercept, and the IVW (random-effects model) MR-PRESSO, and leave-one-out (LOO) analysis were used to address between variants heterogeneity and pleiotropy effect [17, 28].
Thirdly, we assessed the quality of the pooled evidence for the associations between identified variables and cognitive function. One score was assigned based on the following conditions: (1) to be significantly associated with cognitive function in multivariate analysis and have same direction of effect in univariate analysis; (2) to be significantly associated with more than one cognitive test in multivariate analysis; (3) to be significantly associated with longitudinal cognitive decline; (4) to have genetic association with cognitive function in the one-sample MR analysis; (5) to have genetic association with dementia in the two-sample MR analysis. The total score ranged from 1 to 5, with scores of 4–5, 2–3, and 1 indicating high-quality, medium-quality, and low-quality evidence, respectively. To investigate the joint effect of high-quality variables on dementia and AD, we computed a combined score. Each variable was dichotomized into a binary classification based on either their original binary status or the median value for continue variable. A score of 1 was assigned if the variable was deemed beneficial to cognitive function, and a score of 0 otherwise. The composite score for each individual was subsequently derived by summing the scores of all high-quality variables. The longitudinal association between the combined score and the incidence of dementia and AD in the complete population was investigated using Cox proportional hazards models. We assessed the assumption of proportional hazards, and all analyses adjusted for age, gender, and APOE ε4. P values less than 0.05 were considered statistically significant.
留言 (0)