Feasibility of multiorgan risk prediction with routinely collected diagnostics: a prospective cohort study in the UK Biobank

WHAT IS ALREADY KNOWN ON THIS TOPIC

Primary care health checks (like the National Health Service Health Check) present a crucial opportunity to assess underlying cardiovascular risk and to intervene to prevent or delay longer-term cardiovascular disease. Widely validated risk tools such as QRISK3 enable cardiovascular risk to be calculated easily at that appointment and inform targeted decision-making. There are validated risk scores to profile risk for other diseases, but there is not enough time during the health check to gather the various risk score inputs and handle separate calculator tools.

WHAT THIS STUDY ADDS

In this study, we show that information already being collected as part of the primary care health check could feasibly be combined into a single calculator providing 10-year risk estimates for multiple diseases across related organ systems of heart, brain, liver and kidney. Moreover, much of the essential information can be acquired remotely.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

When patients attend their health check, they could potentially receive risk scores for multiple disease outcomes in addition to cardiovascular risk. Having earlier access to multiorgan information has the potential to enable earlier intervention for risk factors, more targeted use of resources and more effective multimorbidity prevention.

Introduction

Multimorbidity presents an urgent and increasing health challenge for ageing populations,1 with implications for health equity, disability and healthcare costs.2 3 Experts warn that effective handling of multimorbidity will require a multisystem approach4 prioritising proactive, rather than reactive, care,5 6 with primary care taking a leading role in chronic disease prevention.7–9

Assessment of atherosclerotic cardiovascular disease (CVD) risk is central to primary care and is now quick and easy due to widely available risk calculator tools such QRISK310 and Framingham Risk Score.11 In the UK, primary care risk assessment has been codified in the form of the National Health Service (NHS) Health Check.12 13 At the health check, a number of clinical parameters are collected, CVD risk is assessed and the general practitioner directs personalised interventions that influence the long-term health trajectory of the patient. Despite urgent calls for more preventative attention to other diseases,14 15 there are currently no existing methods for multidisease risk prediction in primary care.

The primary targets of the NHS Health Check are heart disease, diabetes, stroke, dementia, kidney and liver disease, as laid out in the official guidance,12 website16 and patient information.17 These conditions are known to share underlying mechanisms18–20 and to co-occur in multimorbidity clusters.21–23

The objective of this study is to examine the feasibility of expanding the primary care health check to include risk assessment across multiple diseases. We focus on the 10 most commonly occurring serious conditions across the heart, brain, kidney and liver, namely, myocardial infarction, atrial fibrillation, heart failure, stroke, all-cause dementia, chronic kidney disease, fatty liver disease, alcoholic liver disease, liver cirrhosis and liver failure. Having access to a wider panel of risk information could lead to earlier disease detection, more targeted interventions and more effective prevention of longer-term multimorbidity.

However, there are several important challenges to consider. First, although risk scores have previously been developed for each additional condition (eg, dementia) there is simply not enough time within a 10–15 min consultation to gather all the required inputs and to calculate each risk score separately.24 A preferred solution would involve a single pool of inputs, and a single data entry page, from which multiple risk estimates could be calculated simultaneously.

Second, individual risk scores differ by the people they exclude, depending on the cohort in which they were developed.25 This leads to shifting sets of calculators (and required inputs) in the hands of the physician depending on the existing comorbidities of the patient. Instead, future solutions would include a person’s medical history and existing diagnoses, and adjust risk estimates accordingly.26 27

Third, not all health measures are equally accessible. NHS England is actively exploring ways that remote healthcare solutions can be used effectively to ease health service usage and make primary care services more accessible to all.28–30 These objectives call us to reflect on the information that is easily obtained and consider whether simple metrics can be potentially powerfully combined.

In this study, we use the UK Biobank data resource to emulate the information available within the primary care setting. Our objective is to explore the feasibility of multidisease risk estimation with easily collected diagnostics (figure 1), setting the minimum acceptable performance of 0.70 area under the receiver operating curve (AUROC) across all outcomes (as per Fagerland and, Hosmer, p17731). We begin by evaluating a range of published risk scores and assessing their performance in the UK Biobank cohort. We review the component inputs for each of those risk indices and identify scores that can be applied fully remotely (ie, without direct in-person contact) and those with a standard set of in-person inputs. Finally, we reuse the standard set of inputs to develop new risk scores compare their performance with existing risk scores.

Figure 1

What can existing information tell us about multiorgan disease risk? In the context of increasing multimorbidity, we examine the feasibility of extending the existing primary care health check framework to include risk assessment for multiple diseases within the heart-brain-liver-kidney cluster. We evaluate a range of existing risk scores and consider whether and how they could be blended, and whether information already being collected could be effectively reused. If successful, this expansion could lead to earlier disease detection, more effective prevention and better resource allocation for multimorbidity prevention. *The diabetes screening component of the NHS Health Check protocol is not part of this analysis and would exist unchanged in both versions. CVD, cardiovascular disease; NICE, National Institute for Health and Care Excellence; NHS, National Health Service.

MethodsSetting and study population

UK Biobank is a large prospective cohort study with participants drawn from the general population.32 UK residents aged 40–69 years old who are registered with a general practitioner, as identified from NHS registers, were invited to participate. Baseline data collection took place between 2006 and 2010, where registration date was used as the index date for the study. Follow-up events were ascertained via linked health records with latest censor date of 31 October 2022. Comprehensive details regarding linked primary care and hospital records are provided in official UK Biobank resources.33 34 To focus on 10-year risk estimation for all outcomes, follow-up was truncated at 10 years following baseline, giving a median follow-up time of 10 years (IQR=10–10). The clinical and demographic parameters collected in UK Biobank mimic those available in primary care and permit population-based modelling of multiorgan risk from baseline features.

From the overall UK Biobank cohort (n=502 386), 1298 participants were removed due to self-withdrawal or loss to follow-up, and 271 386 participants did not have primary care data available. There were 229 702 remaining participants who were confirmed to be present in both the main dataset and the primary care dataset. From these, a further 1462 participants were excluded due to missing values for height, weight, waist or hip circumference, leaving a final sample of 228 240 participants (see online supplemental figure 1A and online supplemental methods SM2).

NHS Health Check and easily collected diagnostics

The NHS Health Check is a preventative primary care initiative (https://www.healthcheck.nhs.uk/)12 that forms the situational anchor for our study. Briefly, healthy people aged between 40 and 74 years are invited to visit their primary care team, where an inexpensive set of diagnostics are collected and 10-year risk of CVD is calculated using the widely validated QRISK3 calculator10 (https://qrisk.org/three/) or similar tool (figure 1).

Simple self-reported measures such as age, sex, family history, lifestyle factors, current medications and medical history are features that can be reported verbally and can be collected fully remotely (ie, without in-person contact). Physical measures such as height, weight, waist and hip circumference are easily measured without specialised technology. These are categorised as ‘remote features’ and are shown in the first two columns of online supplemental table 1. The term ‘remote’ is used to convey that remote collection of these parameters is possible, whether by phone or via an online form. Remote features can also be collected in person as part of the primary care visit.

Best practice guidelines12 specify a minimum set of parameters to be collected as part of the standard NHS Health Check protocol, consistent with the use of QRISK3. These include a number of remote features, with the addition of blood pressure measurement and blood tests for total and high-density lipoprotein (HDL) cholesterol. In this study, we include all remote features plus the required NHS Health Check parameters (blood pressure and cholesterol) under the category of ‘standard features’.

Finally, there are various blood/biochemistry tests that are widely available but are not part of the existing first-line health check protocol. These measures were identified based on their inclusion in existing research or risk scores (see ‘Existing risk scores’below) and have been included as an additional analysis to evaluate their potential incremental utility. These are shown in the fourth column of online supplemental table 1, and the full set of features including additional blood tests are referred to as ‘extended features’.

Ascertainment of outcomes

Diagnoses and dates for the 10 disease outcomes (itemised above) were collated across multiple UK Biobank sources including self-report, linked hospital and primary care records and deaths, using published code lists where available.35–37 Incident outcomes were defined by first occurrence of disease after baseline recruitment. Participants with a record of the same disease at baseline were excluded from modelling for that disease, and follow-up was censored at either death or the study end date. In addition to the defined outcomes, a wide selection of other potentially relevant diagnoses was collected using the same multisource approach (see online supplemental table 1). A full listing of UK Biobank codes for outcomes ascertainment is provided in online supplemental table 2.

Existing risk scores

We calculated QRISK3 for all study participants, along with 21 other published risk scores targeting disease risk across heart, brain, liver and kidney (see figure 2). These risk scores were selected based on a literature search for the most frequently used metrics for each outcome, the availability of published equations and the availability of online calculators for quality checking. Detailed information for all published indices is provided in online supplemental table 3. For incident stroke risk we considered QStroke38 and CHA2DS2-VASc,39 a score comprising congestive heart failure, hypertension, age, diabetes, prior stroke or transient ischaemic attack, vascular disease and sex. For all-cause dementia, we included three dementia risk scores; one developed using the CAIDE study (Cardiovascular Risk Factors, Aging and Dementia),40 the Lifestyle for Brain Health score (LIBRA)41 and the recently developed UK Biobank Dementia Risk Score (UKB-DRS).37 For myocardial infarction and heart failure, we considered Framingham Risk Score (with and without blood lipids),11 the Pooled Cohort Equations to Prevent Heart Failure (PCP-HF risk score 42) and QRISK3.10 For atrial fibrillation, we applied the Cohorts for Heart and Aging Research in Genomic Epidemiology model for atrial fibrillation (CHARGE-AF43). Chronic kidney disease risk (stage 3+) was predicted by two versions of QKidney44 and a Kidney Risk Score developed by Nelson and colleagues.45 For fatty liver disease, we considered the Fatty Liver Index46 and the Dallas Steatosis Index.47 Three diabetes risk scores (two versions of QDiabetes48 and Cambridge Diabetes Score49) were included as possibly useful predictors. AUDIT-C50 (the Alcohol Use Disorders Identification Test) is a questionnaire designed to assess risk for alcoholic liver disease, however, only the first AUDIT question was available in the UK Biobank with a high degree of data completeness. Lastly, liver fibrosis was represented by three candidate scores; the Fibrosis-4 Index (FIB-451), the nonalcoholic fatty liver disease (NAFLD) fibrosis score52 and APRI (the aspartate aminotransferase/platelet ratio53). All published risk scores have been separately validated in their own studies. To assess their general utility, all risk scores were applied to the whole sample and to 10-year follow-up regardless of restrictions present in each respective derivation cohort.

Figure 2

Risk scores across heart, brain, liver and kidney disease and their overlapping constituents. Coloured dots indicate measures that are included in the published risk scores shown on the right-hand side. Outcomes that are targeted by each risk score are shown on the left-hand side. Risk scores that can be implemented fully remotely are shown with an asterisk (*), in other words, risk scores that can be calculated without in-person contact. Physical measures include height, weight, waist circumference, hip circumference and resting heart rate. See online supplemental table 3 for a detailed listing of risk scores and their inputs. ALT, alanine aminotransferase; AST, aspartate aminotransferase; CAIDE, dementia risk score from the Cardiovascular Risk Factors, Aging and Dementia study; CHA2DS2-VASc, a score comprising congestive heart failure, hypertension, age, diabetes, prior stroke or transient ischaemic attack, vascular disease and sex; CHARGE-AF, Cohorts for Heart and Aging Research in Genomic Epidemiology model for atrial fibrillation; LIBRA, Lifestyle for Brain Health score; UKB-DRS, UK Biobank Dementia Risk Score.

Existing risk scores inherit their type from their inputs

At the high level, examination of score inputs (figure 2) shows significant overlap in feature topics, however, the finer level detail (online supplemental table 3) reveals significant variability in input requirements. Moreover, the inputs to each risk score differ by accessibility, in other words, risk score inputs are often a mixture of remote and standard features, with occasional additional blood tests. Extending the health information framework described above, we categorise any risk score that is comprised only of inputs that can be collected remotely as a remote risk score. UKB-DRS37 (dementia) and Framingham risk score (using body mass index, BMI)11 are examples of this. With the most restricted set of inputs, we would expect these models to be the least powerful and have the lowest predictive performance. Risk scores that include remote and standard features are considered standard risk scores (eg, QRISK,10 QStroke38), while risk scores that require additional blood tests fall into the category of extended risk scores (eg, Nelson Kidney Risk Score,45 Fatty Liver Index46). We would expect risk models with access to the extended set of inputs (standard inputs plus additional biochemistry) to have the highest predictive performance. These categories are relevant to the process of building and comparing risk scores, such that performance comparisons are between models of the same type.

Ascertainment of other features

Age at baseline, self-reported sex, systolic blood pressure, pulse rate and anthropomorphic measurements were taken at baseline, along with a touchscreen questionnaire collecting information about ethnicity, education, family history, smoking, alcohol use and physical activity. Townsend Deprivation Index at baseline was assigned based on participant postcode. Ethnic groups and smoking categories were recoded to match QRISK3 specifications. Education was coded as binary—‘Do you have any postsecondary/college/university qualifications?’ Physical activity was dichotomised to greater than or equal to 600 summed metabolic equivalent task minutes per week,54 approximately equivalent to 20 min exercise per day.55 Family history was drawn from self-reported illnesses of mother, father and siblings, where age of illness was not specified. Descriptive statistics and source information for study features are provided in online supplemental table 4. Blood sampling was carried out as part of the baseline assessment, providing measures of total cholesterol, HDL cholesterol and additional biochemistry (online supplemental table 5). Missingness among remote features was very small (<1%) while missingness among blood test variables ranged between 3% and 14%. Covariate missingness was handled with multiple imputation, with details provided in online supplemental methods SM3 and online supplemental table 6.

Statistical analysis

Statistical analysis was with R V.4.1.2 and RStudio V.2022.02.0. We randomly stratified the data to create a training set (70%, 159 768 participants) and an internal validation set (30%, 68 472 participants) following the Strengthening the Reporting of Observational studies in Epidemiology (STROBE) checklist (online supplemental table 7). Standard checks confirmed that the characteristics of the training and validation cohorts were not significantly different.

Pairwise modelling of cross-disease associations

Prior to full-scale modelling, we sought to describe the associations between diseases. We examined the association of prevalent conditions with the risk of incident conditions using Cox proportional hazards regression, adjusting by age, sex, postsecondary education, ethnicity, smoking, physical activity, alcohol intake frequency, BMI, Townsend Deprivation Index, family history (of heart disease, stroke and dementia), any cancer diagnosis, hypertension, high cholesterol and diabetes. Proportionality was checked with visualisation of residuals across all models. This analysis was performed using the whole cohort (n=228 240). Due to the large number of tests in this section, all coefficient tests were adjusted for multiple testing via the Benjamini-Hochberg method56 with a false discovery rate of 5%.

Evaluation of existing risk scores

From the initial set of 22 published risk indices described above, we identified the best-performing score for each outcome across the 10 years of follow-up in the whole sample. Importantly, we evaluated all risk-score-outcome combinations, checking for potential predictive utility of each index beyond its original derivation cohort and intended outcome. We identified the risk score with highest discriminative performance, as measured by AUROC. AUROC was selected as the primary criterion for risk score performance because it does not depend on a specific prediction threshold and is more effective than simple accuracy in situations where rare events are being predicted.

New models for heart, brain, liver and kidney disease

Then, new models were developed for each of the 10 outcomes in the training set, and their performance for 10-year prediction was assessed in the internal validation set (online supplemental figure 1B). Model fitting was carried out (1) using remote features, (2) using standard features and (3) using the extended set of features. Crucially, at each level, we restricted all models (across the 10 related outcomes) to draw from the same pool of predictors. Feature selection was conducted using a stability selection approach,57 combining lasso Cox regression with bootstrapping to systematically identify the predictors that show consistent importance, thereby simplifying the final model and mitigating the risk of overfitting (see online supplemental methods SM4).

For each outcome, the final set of predictors was placed into a single survival model, with coefficients and prediction thresholds calculated using the training set, and predictions scored for discriminative accuracy in the validation set. Across all models, prediction performance was assessed with multiple metrics including AUROC, sensitivity, specificity, Somer’s Dxy and Brier score. Differences in performance metrics were further bootstrapped with 1000 bootstrapped samples to derive uncertainty estimates. To calculate sensitivity and specificity, the prediction thresholds were set to maximise balanced accuracy, given by (sensitivity+specificity)/2.58 Comparative performance was further evaluated with calibration plots to visually assess the alignment of predicted probabilities with actual outcomes, and reclassification statistics (integrated discrimination improvement (IDI) and continuous net reclassification improvement (cNRI)) to quantify any incremental improvements between existing and new models (online supplemental methods SM5 and SM6).

Patient and public involvement

Patients and/or the public were not involved in the design, conduct, reporting or dissemination plans of this research.

ResultsParticipant characteristics

Overall, the study sample (n=228 240) was 45.3% male and 54.7% female, with an average age of 56.5 years at baseline (SD 8.1 years, table 1). Prevalence of hypertension, high cholesterol and diabetes at baseline was 32.6%, 20.8% and 5.5%, respectively. The most common incident event was atrial fibrillation (n=9997; 4.4%), and liver failure was the least common (n=340 events; 0.1%). Training and internal validation sets were similar across baseline variables and outcomes.

Table 1

Sample characteristics

Associations between existing disease and future disease risk

Pairwise Cox analysis between heart-brain-liver-kidney outcomes and existing disease diagnoses revealed multiple cross-organ associations (figure 3, online supplemental table 8). All major heart diseases were significantly associated with increased risk of stroke, liver failure and the development of chronic kidney disease. All non-infective liver diseases at baseline were associated with increased risk of heart disease within 10 years, with an 88% increased risk of heart failure in participants with cirrhosis (HR 1.88, 95% CI 1.31 to 2.71, p=7.05×10−4), and a 39% increased risk of myocardial infarction in participants with fatty liver disease at baseline (1.39, 95% CI 1.08 to 1.80, p=0.012). Participants with alcoholic liver disease at baseline had a 4.5-fold risk for all-cause dementia (4.49, 95% CI 3.10 to 6.49, p=1.46×10−15) while rheumatoid arthritis at baseline conferred a 61% increased risk (1.61, 95% CI 1.30 to 1.99, p=1.50×10−5). Diagnosis of kidney or systemic inflammatory disease at baseline was associated with increased 10-year risk for disease across heart, brain and liver. Depression diagnosis at baseline had significant associations with disease across all four organs while serious mental illness (bipolar/schizophrenia/other psychosis) was associated with increased risk of alcoholic liver disease (HR 1.73, 95% CI 1.04 to 2.85, p=0.033), chronic kidney disease (HR 1.74, 95% CI 1.49 to 2.02, p=6.78×10−13) and all-cause dementia (HR 3.17, 95% CI 2.58 to 3.90, p=9.66×10−28).

Figure 3

HRs for incident heart-brain-liver-kidney outcomes by existing disease diagnoses at baseline. Each entry shows the HR for incident outcomes (shown along the top) associated with the presence of existing risk factors, diagnoses and medication at baseline (shown down the right-hand side) in the whole cohort (n=228,240), using Cox-proportional hazards regression. For example, pre-existing hypertension increases the 10-year risk of stroke by 46%. Models are adjusted by age, sex, postsecondary education, ethnicity, smoking, physical activity, alcohol intake frequency, body mass index, Townsend Deprivation Index, family history (of heart disease, stroke and dementia), any cancer diagnosis, hypertension, high cholesterol and diabetes. HR significance was adjusted for multiple testing with a false discovery rate of 5%, where non-significant results are shown as empty white cells. Each result is from a different model. See online supplemental table 8 for detailed results. AF, atrial fibrillation; ALD, alcoholic liver disease; CKD, chronic kidney disease (stages 3, 4 or 5); CIRR, cirrhosis; DEM, all-cause dementia; DVT, deep vein thrombosis; FLD, fatty liver disease; HF, heart failure; LF, liver failure; MI, myocardial infarction; PE, pulmonary embolism.

Best existing risk scores

Details of the three best-performing existing risk scores of each type (by highest sample AUROC) for each level of accessibility are shown in online supplemental table 9. From here, the one risk score with the highest AUROC was applied as the comparator in the validation sample, shown by dark blue bars in figure 4 with additional details in online supplemental table 10. At least one existing risk score surpassed the minimum adequate AUROC (0.70) for all outcomes except liver failure, with CHARGE-AF providing the best performance for atrial fibrillation across all levels (AUROC 0.759, 95% CI 0.751 to 0.767)), and UKB-DRS performing best for all-cause dementia (0.807, 95% CI 0.793 to 0.820)). Please note, that where a lower-level (remote or standard) model performs better than all more complex (extended) models for the same outcome, it will be retained as the best model at that level. Within the remote models, QKidney 5 had the highest AUROC for stroke (0.701 (95% CI 0.687 to 0.715)) and a surprisingly high remote model AUROC for heart failure (0.798 (95% CI 0.787 to 0.809)). Within standard models, we found several expected risk-score-outcome pairings, namely QStroke for stroke (0.727 (95% CI 0.714 to 0.741)), QRISK3 for myocardial infarction (0.757 (95% CI 0.747 to 0.767)) and QKidney three for chronic kidney disease (0.760 (95% CI 0.751 to 0.769)). Unexpectedly, QStroke also had the highest AUROC for 10-year heart failure (0.806 (95% CI 0.795 to 0.817)).

Figure 4

Performance of multiorgan risk scores in the validation set. Comparison of area under the receiver operating curve (AUROC, also known as concordance or C-statistic) for 10-year risk prediction across 10 outcomes. Horizontal bars show the 95% CI for validation set AUROC with uncertainty estimated from 1000 bootstrapped samples. Predictions from existing risk scores are shown in dark blue, while newly developed risk scores are shown in green. Remote models contain health metrics that can be answered verbally or self-measured easily by the patient. Standard models contain all remote metrics plus blood pressure and serum cholesterol tests. Extended models contain further blood tests. See online supplemental table 10 for detailed results. BMI, body mass index; UKB-DRS, UK Biobank Dementia Risk Score; APRI, the ratio of aspartate aminotransferase (AST) to platelet count.

Multiorgan risk prediction

The prediction performance of the newly developed risk scores is presented as green bars in figure 4 with additional details in online supplemental table 10. In general, the new risk score models performed as well or better than existing risk scores for all outcomes, with all new models achieving AUROC above 0.70. Using standard health check predictors, newly developed models performed significantly better than existing risk scores for some outcomes, with higher AUROC for myocardial infarction of 0.785 with 95% CI (0.775 to 0.795)), atrial fibrillation (0.777 (95% CI 0.768 to 0.785)), heart failure (0.828 (95% CI 0.818 to 0.838)), fatty liver disease (0.766 (95% CI 0.753 to 0.779)), alcoholic liver disease (0.864 (95% CI 0.835 to 0.894)) and liver cirrhosis (0.763 (95% CI 0.734 to 0.793)). Newly developed risk models had similar performance to existing scores for stroke (0.727 (95% CI 0.713 to 0.740)), dementia (0.823 (95% CI 0.810 to 0.836)) and chronic kidney disease (0.774 (95% CI 0.765 to 0.783)).

Importantly, for all outcomes studied, newly developed models using only remote features were able to achieve similar discriminative accuracy (AUROC) to their respective standard models.

When the set of additional biochemistry was added to the pool of potential predictors (in the extended models), there was very little incremental increase in predictive performance for heart and brain outcomes. In contrast, extended model features produced significantly better predictions for chronic kidney disease (AUROC 0.875, 95% CI 0.868 to 0.881), fatty liver (0.809, 95% CI 0.797 to 0.822), alcoholic liver (0.922, 95% CI 0.899 to 0.944) and cirrhosis (0.862, 95% CI 0.837 to 0.888). This improvement is substantial compared with existing risk scores and standard model estimates, suggesting that an approach with more blood biomarkers might be better at picking up these abnormalities. As expected, calibration statistics (online supplemental figure 3) and reclassification indices (online supplemental table 11) showed better calibration and significantly improved reclassification in the newly developed risk scores compared with existing risk scores, for example, standard myocardial infarction IDI=0.014, 95% CI (0.011 to 0.017) and cNRI=0.633, 95% CI (0.588 to 0.677). In nearly all comparisons, newly developed models had lower Brier scores (less average squared error) and higher Somers’ Dxy (better rank correlation) than existing models. While there were some instances of improved sensitivity in the newer models (stroke, myocardial infarction, chronic kidney disease, alcoholic liver and cirrhosis), overall, the improvements in discrimination were mainly driven by better specificity (fewer false positives, online supplemental table 10).

Discussion

In this proof-of-concept analysis with 228 240 UK Biobank participants, we demonstrated that easily collected diagnostics can be used to assess risk across multiple disease outcomes. We have shown how this can be done without specialist computing or invasive biomarkers.

Pairwise modelling showed a complex pattern of cross-system associations, building on prior efforts to understand multimorbidity in the heart-brain-liver-kidney cluster.23 We confirmed that disease risk across all four organs was significantly associated with well-known risk factors such as hypertension, diabetes and high cholesterol; as well as other factors that have not yet been incorporated into standard risk paradigms beyond QRISK3, such as mental illness, systemic inflammation, sleep quality, arterial health and medication use.59 60

Recent studies have shown significant improvements in cardiovascular risk prediction using large data sets and machine learning methods.60–63 However, these studies still only target one organ (the heart), and when compared with conventional statistical models, deep learning or other ‘black box’ methods are not as readily explainable or easily translatable to clinical use.64 Several studies have tackled multidisease prediction. Bayati et al 65 use multitask learning and group dimensionality reduction to identify a reduced pool of health check features to predict heart-brain-liver-kidney outcomes across 2 years follow-up. Most similar to the current work, Mahajan et al 66 used electronic health records to derive multiple organ-specific risk scores, with a high degree of discrimination (AUROC>0.80 across heart, brain, lung, kidney and digestive disease). However, this study predicted hospital readmission using previous admissions for the same disease, whereas our models predict risk of new-onset disease.

The current study has several important limitations. We recognise the importance of thoroughly evaluating existing risk instruments before moving forward with new risk score development. We have begun this process, but there is more work to be done. On the other hand, by restricting the pool of input variables, new risk score development may well be required to meet this constraint, particularly where remote risk scores of adequate quality do not yet exist.

We acknowledge that the internal validation performance of our scores (developed within UK Biobank) is not directly comparable with external validation performance of published risk scores developed outside UK Biobank. Furthermore, each published risk score has a range of validation values across published work. For example, the validation AUROC provided by the original QRISK3 paper10 was 0.88 for women and 0.86 for men. Since then other external validation performance has varied in a numerical range consistent with the current work, with values of 0.707 for women and 0.681 for men reported for the 45–64 age group in the Clinical Practice Research Datalink,67 and values of 0.722 for women and 0.697 for men in a recent validation in UK Biobank.68

It may seem unconventional to apply existing risk score instruments outside their intended cohort (eg, including people with comorbidities) and outside their intended outcome (eg, using QRISK3 to predict myocardial infarction rather than combined CVD). Our study is not the first to explore this approach25 69 and to provide essential evidence for whether existing scores hold unrealised potential in additional contexts.

Although large compared with some, we caution that this study is small compared with larger risk score development projects, and we have not provided an external validation cohort. Furthermore, UK Biobank participants are subject to self-selection bias, and as such, they are known to be healthier and less ethnically diverse than the UK population.70 Therefore, any final models with this approach will require further recalibration and validation in large nationally representative cohorts. We acknowledge that there are some variables that are not well measured in the UK Biobank. Where these come up in multidisease risk equations, these are likely to be more accurately captured in a primary care-specific database.

In conclusion, this analysis demonstrates the feasibility of using standard health check predictors to produce multidisease risk estimates of reasonable quality. Such an approach has the potential to ease pressure on primary care, allowing physicians more time to focus on interpretation and follow-up71 thus providing new opportunities for multimorbidity prevention.

Data availability statement

Data may be obtained from a third party and are not publicly available. This analysis was produced under UK Biobank Access Application 59867. The data in this study are owned by the UK Biobank (www. ukbiobank.ac.uk) and legal constraints do not permit public sharing of the data. The UK Biobank, however, is open to all bona fide researchers anywhere in the world. Thus, the data used in this communication can be easily and directly accessed by applying through the UK Biobank Access Management System (www.ukbiobank.ac.uk/ register-apply). Results from this study will be returned to UK Biobank according to their published returns policy.

Ethics statementsPatient consent for publicationEthics approval

This study involves human participants and this study complies with the Declaration of Helsinki; the work was covered by the ethical approval for UK Biobank studies from the National Health Service (NHS) National Research Ethics Service on 17 June 2011 (Ref 11/NW/0382) and extended on 18 June 2021 (Ref 21/NW/0157) with written informed consent obtained from all participants. Participants gave informed consent to participate in the study before taking part.

View original article

BMJ EVIDENCE-BASED MEDICINE

分享书签

0 0 0 0 0 0 0

More from this channel

Feasibility of multiorgan risk prediction with routinely collected diagnostics: a prospective cohort study in the UK Biobank

留言 (0)