The association strengths of current PRSs with risks of stroke and its subtypes were moderate in the Chinese population.
PRS for ischaemic stroke was positively associated with the risk of intracerebral haemorrhage.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICYIn the Chinese population, current PRSs might have limited value for improving stroke risk prediction over traditional risk factors.
Further studies are warranted to assess whether new PRSs based on larger genome-wide association study or other developing methods have considerable potential to translate into population health benefits.
IntroductionStroke is one of the leading causes of death and disease burdens globally.1 Stroke includes two main subtypes, such as ischaemic stroke (IS) and haemorrhagic stroke (HS). The latter could further be divided into intracerebral haemorrhage (ICH) and subarachnoid haemorrhage (SAH). With the accumulation of genomic data worldwide, the genetic background of stroke and its subtypes is gradually being revealed. Polygenic risk score (PRS), a method used to combine minor genetic effects across the whole genome, has been increasingly used in stroke research. Several studies based on European populations have developed PRSs for any stroke (AS) or IS and suggested their potential to improve risk prediction and risk stratification.2–9 The incidence of stroke in China, especially ICH, is higher than in Western countries.1 Recently, a PRS for AS was developed based on the Chinese population and showed similar association strength in predicting the risk of IS and HS.10 However, IS and HS might have different aetiological mechanisms.11–13 Different stroke subtypes also have their specific genetic loci.14 No study has specifically developed PRSs for subtypes of stroke in the Chinese population.
The present study was based on a subcohort with genomic data from the China Kadoorie Biobank (CKB). We aimed to examine the association strengths of PRSs with risks of stroke and its subtypes in the Chinese population.
MethodsParticipantsCKB is an ongoing prospective study with 512 724 participants aged 30–79 enrolled from five urban and five rural regions in China between 2004 and 2008. Details of the study have been described elsewhere.15
Among all CKB participants, there are 100 639 participants with genome-wide genotypic data. Of them, 24 657 participants were selected based on a case–control design nested within the cohort with the primary aim of studying CVD (‘case–control samples’), which formed four matched-case-control training sets (figure 1A, online supplemental methods, tables 1 and 2). The other 75 982 participants were randomly selected from the entire CKB cohort (‘population-based samples’); after excluding participants with self-reported coronary artery disease or stroke or transient ischaemic attack at baseline (n=3832), the remaining participants were used as a ‘testing set’ (n=72 150) (figure 1A, online supplemental methods).
Figure 1Overview of the present study. (A) Flow chart for the study population; (B) Study design. The current study can be divided into four parts: (1) validation of previous PRSs, (2) development of new PRSs, (3) identification of the optimal PRS for each outcome and (4) validation and evaluation of the optimal PRS for each outcome. aParticipants who had a first or second-degree relative in the sample (kinship coefficient φ>0.125) were removed by using PLINK 1.9. bPlease refer to online supplemental methods for detailed procedures of case-control matching. cSee online supplemental methods and table 3 for details. dSee online supplemental methods and table 4 for details. AS, any stroke; C+T, clumping and thresholding; CAD, coronary heart disease; CKB, China Kadoorie Biobank; GWAS, genome-wide association study; ICH, intracerebral haemorrhage; IS, ischaemic stroke; PRS, polygenic risk score; SAH, subarachnoid haemorrhage; SSF, summary statistics file; TIA, transient ischaemic attack.
Study designThe current study can be divided into four parts (figure 1B). (1) Validation of previous PRSs. Four previously reported stroke-related PRSs were selected for validation.2 4 5 10 (2) Development of new PRSs. Clumping and thresholding (‘C+T’) and LDpred16 were used to develop new PRSs for stroke and its subtypes based on two genome-wide association studies with large sample sizes.14 17 (3) Identification of the optimal PRS for each outcome. The performances of different PRSs in predicting each outcome were compared in the corresponding training sets. (4) Validation and evaluation of the optimal PRS for each outcome. We prospectively examined the associations between optimal PRSs and risks of stroke and its subtypes. We evaluated the impact of PRSs on the risk prediction improvement by adding the optimal PRS to traditional risk prediction models in the testing set.
Assessment of traditional stroke risk factorsThe baseline questionnaire collected information on sociodemographic characteristics, lifestyle behaviours, dietary habits, and personal and family medical history.15 Traditional stroke risk factors considered in the present study included sex, age, systolic and diastolic blood pressure (SBP and DBP), smoking, body mass index (BMI), waist circumference, hypertension, diabetes and family history of stroke. Details on the collection and definition of these variables have been described in our previous work.18 19
Genetic dataAt baseline, a 10 mL random blood sample was collected from each participant. Genotyping and imputation in this study were centrally conducted, with details provided in our previous study.19 20 Briefly, two custom-designed single nucleotide polymorphism (SNP) arrays (Affymetrix Axiom CKB array) were used for genotyping. Imputation was performed based on haplotypes derived from the 1000 Genomes Project Phase 3. There were 9.54 million genetic variants with high reliability (online supplemental figure 1).
Ascertainment of stroke outcomesAll participants were followed up for morbidity and mortality since their baseline enrolment. Incident events were identified by linking with local disease and death registries and the national health insurance database and supplemented by active follow-up.15 In the testing set, only 653 (0.91%) were lost to follow-up before censoring on 31 December 2018. Trained staff blinded to baseline information coded all events using the International Classification of Diseases, 10th Revision (ICD-10). Incident stroke events during the follow-up were defined as I60–I64, including SAH (I60), ICH (I61), other nontraumatic intracranial haemorrhage (I62), IS (I63) and unspecified stroke (I64). In the testing set, the events coded as I62 and I64 accounted for only 0.9% (n=76) and 3.5% (n=302) of all incident stroke events.
Since 2014, medical records of incident stroke cases have been retrieved and reviewed by qualified cardiovascular specialists blinded to baseline information. According to a previous study,24 by October 2018, the reporting accuracy was 91.7%, 90.4% and 82.7% for IS, ICH and SAH24; the corresponding diagnostic accuracy was 93.1% (including silent lacunar infarction), 98.2% and 98.1%, respectively.24
Identification of the optimal PRS in the training setIn each training set, we used the conditional logistic regression model to measure the association of each PRS with the risk of the corresponding stroke outcome, stratified by the case–control pair, with the top 10 principal components of ancestry (PCA) and array versions as the covariates. We defined the optimal PRS as the PRS with the highest OR per SD, as our previous study did.19
Validation and evaluation of the optimal PRS in the testing setIn the testing set, we used the Cox regression model to measure the association of optimal PRSs with risks of stroke and stroke subtypes. The model was stratified by sex and ten study regions, with age as the time scale and adjusting for the top 10 PCA and array versions. We further adjusted for SBP, BMI and family history of stroke in sensitivity analyses. We evaluated the proportional hazards assumptions by examining Schoenfeld residuals. Either non-existent or minimal deviations were observed. In subgroup analyses, the tests for multiplicative interaction were performed using likelihood ratio tests by comparing models with and without cross-product terms between the stratifying variable and PRS.
To evaluate the impact of PRS on risk prediction improvement, we defined the ‘CKB-CVD models’ as the traditional risk prediction models, as our previous study did.19 The ‘CKB-CVD models’ distinguish risks of IS and haemorrhagic stroke and have good discrimination without relying on blood lipids.18 We added the PRS to traditional models to get a ‘PRS-enhanced model’. We assessed the discrimination performance by using Harrell’s C.25 We used the net reclassification improvement (NRI) and integrated discrimination improvement to evaluate model reclassification before and after the addition of PRS.26
The study adhered to the PRS Reporting Standards and statement Strengthening the reporting of observational studies in epidemiology for cohort studies simultaneously (online supplemental file 2).27 28 Analyses were done with Stata (V.17.0, StataCorp) and R (V.4.0.3). All statistical tests were two sided with α=0.05.
ResultsSelection of the optimal PRSs in the training setsIn this study, four 1:1 matched training sets were defined to identify the optimal PRS for AS (7412 pairs), IS (3844 pairs), ICH (4296 pairs) and SAH (359 pairs) (figure 1, online supplemental methods). Among the training sets, 72.7%, 61.6%, 77.9% and 63.8% of the participants were from rural areas in China; 51.9%, 50.5%, 53.4% and 38.4% of the participants were men, respectively. Among the cases, the median age of disease onset (25th–75th percentile) was 65.3 (57.0–72.0), 64.1 (56.1–70.6), 65.9 (57.7–73.0) and 61.0 (53.8–69.2) years, respectively. Among all training sets, the proportion of the control group using the first version of the SNP array was lower than that of the case group (p<0.001) (online supplemental table 2). The performance of PRS for AS and IS developed in previous studies was not better than that of the newly developed PRS in the present study (table 1, online supplemental table 5). The optimal PRS for AS came from the LDpred method, and the optimal PRS for IS, ICH and SAH came from the C+T method. The ORSD (95% CI) of the optimal PRSs was 1.14 (1.10 to 1.18) for AS, 1.18 (1.13 to 1.24) for IS, 1.10 (1.05 to 1.15) for ICH and 1.25 (1.06 to 1.47) for SAH (table 1, online supplemental table 5).
Table 1The optimal PRSs associated with risks of stroke and its subtypes in the training sets
Associations of PRSs with stroke and its subtypes in the testing setThe testing set included 72 150 Chinese participants, of which 59.8% were women. The median age was 50.6 years in women and 51.9 years in men. During 872 919 person-years of follow-up (over 12 years on average), 8514 incident stroke events were documented, including 7507 IS, 1193 ICH and 132 SAH (table 2). The correlations among the optimal PRSs were weak (all correlation coefficients<0.2) (online supplemental figure 2).
Table 2Characteristics of the testing set
The PRSAS and PRSIS were both positively associated with risks of AS, IS and ICH (p<0.05). The HRSD (95% CIs) of PRSAS was 1.10 (1.07 to 1.12), 1.10 (1.07 to 1.12) and 1.13 (1.07 to 1.20) for AS, IS and ICH, respectively. The corresponding HRSD (95% CIs) of PRSIS was 1.08 (1.06 to 1.11), 1.08 (1.06 to 1.11) and 1.09 (1.03 to 1.15) (figure 2, online supplemental table 6). PRSICH was positively associated with the risk of ICH in the whole testing set (HRSD=1.07), though it was not statistically significant in women (p for sex interaction=0.056) (figure 2C). PRSSAH was not associated with risks of any outcomes (figure 2). A strong association of PRSAS with the risk of SAH (HRSD=1.38, 95% CI 1.03 to 1.87) was observed in men but not in women (p for sex interaction=0.055) (figure 2D).
Figure 2Associations of PRSs with risks of stroke and its subtypes. (A) AS, (B) IS, (C) ICH, (D) SAH. The PRSs reported here are the optimal PRSs for stroke and its subtypes in the training sets (see table 1), which were standardised (0 mean, unit SD) in the testing set. Cox models were stratified by sex and 10 study regions and adjusted for the top 10 principal components of ancestry and array versions, with age as the time scale. The number above the closed square represents the HR. The number of stroke events in women and men has been reported in table 2. The vertical lines indicate 95% CIs. AS, any stroke; ICH, intracerebral hemorrhage; IS, ischaemic stroke; PRS, polygenic risk score; SAH, subarachnoid haemorrhage.
In sensitivity analyses, the associations of PRSs with risks of stroke and its subtypes did not change significantly after additional adjustment for SBP, BMI and family history of stroke (online supplemental table 6). In subgroup analyses, there was no strong evidence supporting a different association strength across subgroups for IS and ICH after considering multiple testing (p for interaction>0.05/8) (online supplemental figures 3 and 4).
Addition of the optimal PRS to traditional risk prediction modelsBased on the traditional models defined in this study, the addition of the PRS did not improve or only slightly improve the discrimination performance of the models. For IS, the addition of PRSAS increased Harrell’s C by 0.0010 in men (p=0.002). For haemorrhagic stroke, the addition of PRSs did not influence Harrell’s C significantly (p>0.05) (figure 3). The addition of the PRS offered little to no improvement in stroke risk stratification. For example, the categorical NRIs at the 10% high-risk threshold for ischaemic and haemorrhagic stroke were all not significant in both sexes (p>0.05) (online supplemental table 7).
Figure 3C statistics evaluating the performance of PRS. The traditional risk prediction models (traditional models) were defined as sex-specific Cox models stratified by 10 study regions, with time on study as the time scale, including models for ischaemic stroke (ICD-10: I63) and models for haemorrhagic stroke (ICD-10: I60–I62).18 Predictors included in traditional models were the same as the ‘CKB-CVD models’, including age, systolic and diastolic blood pressure, use of antihypertensives, current daily smoking, self-reported diabetes and waist circumference. Interactions between age and the other six predictors were also included. The 95% CIs of Harrell’s C and Harrell’s C changes were calculated by 100 bootstrap replications using the BCa method in Stata. CKB, China Kadoorie Biobank; CVD, Cardiovascular disease; ICD, International Classification of Disease; PRS, Polygenic risk score.
DiscussionBased on the largest biobank in the Chinese population, only moderate associations were observed between PRSs and risks of stroke and its subtypes in this Chinese population, with an HRSD of about 1.10. The addition of current PRSs offered little to no improvement in stroke risk prediction and risk stratification. We also found that the PRSs developed from GWAS summary statistics of IS were positively associated with the risk of ICH.
In the present study, the associations of PRSs with risks of stroke and its subtypes were moderate, suggesting a limited value for improving risk prediction over traditional risk factors. The HRSD for PRS was usually greater than 1.20 in previous studies of the general population. A PRS for IS (PGS000039) that was developed with the metaGRS method and combined PRSs of 5 stroke subtypes and 14 stroke-related traits had an HRSD of 1.26 (95% CI 1.22 to 1.31) in the European population.5 Another PRS for stroke (PGS002259) was also developed using the metaGRS method in a Chinese population, with the HRSD for stroke being 1.28 (95% CI 1.21 to 1.36).10 However, these two PRSs showed much weaker associations with the risk of stroke or IS in the present study than in previous studies. Since both PRSs were developed using the elastic-net logistic regression, a machine learning approach, the potential overfitting may undermine their generalisation performance.
The incidence rate of ICH is much higher in Chinese than in European populations. However, non-European populations are under-represented in GWAS, which serves as the basis for PRS development. The largest GWAS for ICH included only 3400 ICH cases, with most of them from European populations.17 The present study attempted to develop PRS for ICH based on summary statistics from this GWAS. The weak associations observed in the present study are either explained by the difference in genetic background between ethnic groups or suggest that this GWAS may be underpowered. The stronger association estimate between PRS and HS risk reported in the previous study was likely due to the inclusion of PRSs for risk factors of HS (such as blood pressure) in the metaGRS method.10 It is worth mentioning that, in the present study, the PRSs directly developed from GWAS summary statistics of IS were also positively associated with the risk of ICH. Although there are differences in aetiology and risk factor profile between IS and ICH,11–13 they might also have some partially shared aetiological mechanisms like the cerebral small-vessel disease.29
This study has the following strengths. The large sample size and a large number of stroke events (including IS and ICH) enabled us to separate powerful training sets and the testing set and to conduct subgroup analyses. The lost to follow-up rate was less than 1% at an average follow-up period of over 12 years in CKB. The main subtypes of stroke (ie, IS, ICH and SAH) were well classified, and the reporting and diagnostic accuracy of stroke events were high.24 The genotyping and imputation of genetic data in this study were centrally conducted through a standard quality control process. Genetic variants with high reliability covered the whole genome well.
However, several limitations merit consideration. First, we did not further consider the subtypes of IS (eg, large-atherosclerotic stroke, cardioembolic stroke and small vessel stroke) as over 75% of the incident IS events were coded as unspecified IS (ICD-10: I63.9), which precluded us from conducting more detailed analyses. Previous studies have suggested that there are differences in genetic loci of different IS subtypes.14 30 Subsequent studies can explore whether distinguishing IS subtypes can further improve the predictive ability of PRS for IS. Second, compared with IS and ICH, the number of SAH events was relatively small. Therefore, it is difficult to exclude chance factors for the positive results observed in the present study. Further studies with more SAH events are warranted to examine our findings. Third, the genetic variants with ambiguous SNP (ie, A/T, C/G) and those that were not found in CKB or had low imputation quality scores were removed during the standard quality control process of PRSs. This might weaken the associations of previous PRSs with stroke and its subtypes. Fourth, because information on blood lipids was not available for the current study population, we were unable to compare the impacts of blood lipids and PRS on traditional stroke risk prediction model improvement. However, the addition of blood lipids may enhance the traditional non-laboratory-based models, as previous studies have shown.31 32 Therefore, adding PRS to a ‘lipid-enhanced model’ might lead to a more minor improvement than what we have observed in the present study.
ConclusionsIn this Chinese population, the associations of optimal PRSs with risks of stroke and its subtypes were moderate, suggesting a limited value for improving risk prediction over traditional risk factors in the context of current GWAS under-representing the East Asian population. As GWAS of stroke and its subtypes progress among East Asians, further studies are warranted to assess whether new PRSs have considerable potential to translate into precision public health and population health benefits and, if so, to determine the appropriate context for their use.
Data availability statementData are available on reasonable request. Details of how to access China Kadoorie Biobank data and details of the data release schedule are available from www.ckbiobank.org/site/Data+Access.
Ethics statementsPatient consent for publicationNot applicable.
Ethics approvalThis study involves human participants and CKB had ethical approvals from the Ethical Review Committee of the Chinese Center for Disease Control and Prevention (Beijing, China) (approval notice: 005/2004) and the Oxford Tropical Research Ethics Committee, University of Oxford (UK) (reference: 025-04). Participants gave informed consent to participate in the study before taking part.
AcknowledgmentsThe most important acknowledgment is to the participants in the study and the members of the survey teams in each of the 10 regional centres, as well as to the project development and management teams based in Beijing, Oxford and the 10 regional centres.
留言 (0)