Biomarker and genomic analyses reveal molecular signatures of non-cardioembolic ischemic stroke

We analyzed data from the Third China National Stroke Registry (CNSR-III), a nationwide, multicenter, prospective observational registry study of patients with AIS or transient ischemic attack (TIA) enrolled at 201 hospitals in China between August 01, 2015 and March 31, 2018. The mean age of the patients was 62.2 (SD = 11.3) years and 31.7% of the patients were female. To explore the biology of ischemic stroke and identify new therapeutic opportunities, we performed a comprehensive molecular and genomic characterization of these patients. Patients with TIA, cardioembolism (CE), or stroke caused by a specific etiology such as Moyamoya disease, Fabry disease, and other uncommon diseases that had specific biomarkers and genetic variants were excluded from the analysis. Patients who presented with cancer or infection before stroke, those without multiple circulating biomarkers, and those who did not undergo whole-genome sequencing (WGS) were also excluded. After data quality control of WGS (30× coverage), 7,695 individuals with NCIS were included in this study (Fig. 1a, b, Supplementary Table 1).

Fig. 1figure 1

Study design and analysis of 63 biomarkers for non-cardioembolic ischemic stroke (NCIS). a Research framework. b Study flow. c Risk of stroke recurrence for 63 circulating biomarkers. Hazard ratio (HR), 95% confidence interval (CI) were calculated by adjusted Cox proportional hazards regression. d Risk of mortality at 1 year, assessed with 63 circulating biomarkers. HR (95% CI) were calculated by adjusted Cox proportional hazards regression. e Risk of poor functional outcome (defined as a modified Rankin Scale (mRS) score >2) at 3 months using 63 circulating biomarkers. Adjusted odds ratios were calculated using logistic regression. All models were adjusted for sex, age, alcohol consumption, smoking, history of stroke, dyslipidemia, hypertension, diabetes mellitus, and coronary heart disease. (Red line: P < 0.05)

Prognostic biomarkers for NCIS

To profile the molecular characteristics and prognostic markers in NCIS, we excluded seven biomarkers that were missing in 25% or more of the samples and 11 biomarkers with correlation coefficients >0.7. A total of 63 circulating biomarkers were included in this study, including electrolytes (n = 3), blood constituents (n = 8), coagulation function (n = 6), liver function (n = 8), renal function (n = 4), inflammation (n = 10), lipid metabolism (n = 12), homocysteine metabolism (n = 4), gut microbial metabolites (n = 7), and glucose metabolism (n = 1) (Fig. 1a, Supplementary Tables 2-3).

We performed cumulative risk assessments for each biomarker and identified 12 high-risk biomarkers for stroke recurrence in the population, including the inflammatory factors interleukin-6 (IL-6), Lp-PLA2-activity, high-sensitivity C-reactive protein (hs-CRP), neutrophils, chitinase-3-like protein 1 (YKL-40), and basophils; D-dimer; fasting plasma glucose (FPG); the gut microbial metabolites choline, butyrobetaine, and trimethyllysine (TML); renal function index (UMA); and the four low-risk biomarkers albumin, apolipoprotein AII, folic acid, and N,N,N-trimethyl-5-aminovaleric acid (TMAVA). In addition, we identified 25 biomarkers significantly (P < 0.05) associated with a high risk of poor functional outcome (modified Rankin Scale (mRS) 3–6) and 19 biomarkers significantly (P < 0.05) associated with a high risk of mortality (Fig. 1c–e).

Hierarchical clustering identifies molecular clusters with prognostic relevance

To determine whether the stroke-affected population can be risk-stratified based on specific circulating biomarkers, we used hierarchical clustering by Euclidian distance to reveal molecular clusters and found that these individuals could be grouped into 30 clusters based on 63 biomarkers (Fig. 2a). The biomarkers that were enriched in each cluster and differed significantly from those in other clusters are shown in Table 1.

Fig. 2figure 2

Clustering analysis reveals subpopulations associated with clinical phenotypes and outcomes of non-cardioembolic ischemic stroke (NCIS). a A heatmap illustrating 30 clusters based on 63 circulating biomarkers. b Characterization of clinical phenotypes and medication use across clusters. c Cumulative risk of stroke recurrence at 1 year for the 30 clusters. d Cumulative risk of mortality at 1 year for the 30 clusters. e Cumulative risk of poor functional outcome at 3 months for the 30 clusters. f Kaplan–Meier curves of time to stroke recurrence within 1-year post-stroke for the 30 clusters. g Uniform manifold approximation and projection (UMAP) of 30 molecular clusters. h t-distributed stochastic neighbor embedding (t-SNE) of 30 molecular clusters. i Incidence of poor functional outcome in patients with or without aspirin therapy (*P < 0.05). j Incidence of stroke recurrence in patients with or without aspirin therapy (*P < 0.05)

Table 1 Clinical characteristics of patients in 30 clusters

To further understand the classification, we compared the differences in clinical phenotypes in these clusters. We found that the risk factors and comorbidity were diverse across 30 clusters (Fig. 2b, Table 1, Supplementary Table 4). Cluster 5 (marked by UMA) and cluster 29 (marked by FPG) had a large proportion of patients with type 2 diabetes (81.6% and 86.9%, respectively) and those receiving hypoglycemic drugs (79.6% and 80.4%, respectively). Cluster 18 (marked by GGT) showed a high prevalence of drinking (45.6%). Cluster 17 (marked by AST and ALT) and cluster 18 had a large proportion of patients with liver disease. Cluster 1 (marked by hs-CRP) showed the highest risk of pulmonary infection (37.7%). Cluster 5 and cluster 7 (marked by cystatin C and creatinine) showed a high risk of urinary tract infection (20.4% and 15%, respectively). Cluster 1 and cluster 4 (marked by IL-6) showed a relatively high risk of hemorrhagic transformation in the hospital.

The identified subpopulations had different risks of clinical outcomes, indicating that distinct molecular profiles played a significant role in the pathophysiology and prognosis of ischemic stroke. There were several important biological subpopulations within the dataset that showed a high risk of unfavorable outcomes. These subpopulations were defined as clusters 1, 4, 5, 7, 17, and 26. Two of these subpopulations were characterized by inflammatory factors such as hs-CRP of cluster 1 and IL-6 of cluster 4. Renal function characterized the subpopulations of cluster 5, identified by UMA, and cluster 7, identified by cystatin C and creatinine. Cluster 17, identified by aspartate aminotransferase (AST) and alanine aminotransferase (ALT), was characterized by liver function. Cluster 26, identified by low-density lipoprotein (LDL), was characterized by lipid metabolism. Conversely, the subpopulations characterized by folic acid (cluster 10), apolipoprotein-E (Apo-E) (cluster 19), and prothrombin time (cluster 11) had a relatively low risk of unfavorable outcomes (Fig. 2c–f, Table 1, and Supplementary Tables 511).

Dimensionality reduction of biomarkers reveals fine-scale population structures in NCIS

Our initial hierarchical clustering approach revealed important clusters defined on a molecular basis; however, hierarchical clustering did not reveal relationships between individuals across clusters. To further investigate the heterogeneity of subpopulations at greater resolution, identified by circulating biomarkers, we visualized the biomarkers using uniform manifold approximation and projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE) (Fig. 2g, h). We observed an overlap of the biomarker components and clusters derived from hierarchical clustering, which was consistent with the molecular profiles that classified the subpopulations. In the low-dimensional space, the similar data points were positioned closer together. We found that biomarkers of the same functional modules were adjacent in space. For example, the subpopulations associated with inflammation, such as cluster 1 (identified by hs-CRP) and cluster 4 (identified by IL-6), were adjacent to each other. The subpopulations characterized by renal function biomarkers, UMA (cluster 5), cystatin C, and creatinine (cluster 7), were also located adjacent to each other. Similarly, the subpopulations characterized by lipid metabolism, cluster 16 (Apo-AI) and cluster 19 (Apo-E), were located adjacent to each other (Supplementary Figs. 13). The low-dimensional embedding of biomarkers indicated that the subtle structural differences of AIS population can be determined based on the specific biomarker in the reduced dimension.

Subpopulations capture specific genetic variations and pathways associated with NCIS

Next, we sought to understand the common genetic basis for variation in molecular clusters. Biomarkers and genome-wide single nucleotide polymorphism (SNP) data were used to estimate the proportion of variation attributable to each cluster. Common genetic variants accounted for 0–13.58% of the interindividual variation within clusters (Fig. 3a, Supplementary Table 12). To identify variants specific for each molecular cluster, we conducted genome-wide association studies (GWAS) across 30 clusters and performed pathway enrichment analysis of biological terms and gene ontology (GO) biological processes for each cluster. We identified 36 loci for 30 molecular clusters that satisfied a genome-wide significance threshold of P = 5.0 × 10−8 (Supplementary Table 13). After applying multiple testing correction to the number of clusters, 20 loci showed significant associations (P < 5.0 × 10−8/30 = 1.67 × 10−9). Seven SNPs were predicted as “deleterious” or “damaging” in missense variants with P-values < 1 × 10−5 (Supplementary Table 14). For example, CLCN6 (rs1023252, SIFT score = 0, deleterious_low_confidence) and C1orf167 (rs61773952, SIFT score = 0.01, deleterious; PolyPhen-2 score = 0.974, probably damaging) could affect protein function in cluster 9, which has been implicated in cardiovascular disease.13 The genes in cluster 9 were enriched for positive regulation of the apoptotic process [GO: 0043065]. The rs4148323 locus (UGT1A1, SIFT score = 0.05, PolyPhen-2 score = 0.625, possibly damaging) could affect protein function in cluster 27, which encodes UDP-glucuronosyltransferase involved in the glucuronidation pathway that transforms small lipophilic molecules, such as bilirubin and drugs. Cluster 27 enriched with genes for flavonoid glucuronidation [GO: 0052696] and aspirin ADME [R-HSA-9749641].

Fig. 3figure 3

Genetic characteristics of the 30 molecular clusters identified in the cohort. a Proportion of variation attributed to the genotyped single-nucleotide polymorphisms (SNPs) and biomarkers for each cluster. b Genetic correlation between biomarkers (yellow box: P < 0.01, purple: FDR < 0.05). c Genetic correlations of the molecular clusters with the biomarkers. d Heritability and functional enrichment of the molecular clusters. e Colocalization of the molecular clusters and biomarkers

Heritability and genetic correlations among biomarkers and subpopulations

The genetic effects on circulating biomarkers could offer novel insights into the mechanisms underlying the genetics of molecular clusters and relevant phenotypes. Therefore, we estimated the SNP-based heritability to further understand the molecular cluster mechanisms, computed pairwise genetic correlations (rg) across 30 molecular clusters and 63 biomarkers using bivariate linkage disequilibrium score regression (LDSC) (Fig. 3b, c), and identified shared genetic backgrounds for correlated biomarkers and molecular clusters. For example, cluster 7 was genetically correlated with cystatin C (rg = 1; P = 0.005), cluster 9 was genetically correlated with homocysteinemia (HCY) (rg = 1; P = 0.006), and cluster 15 had genetic correlations with mean corpuscular volume (MCV) (rg = -0.923; P = 6.81 × 10−11) and hemoglobin (rg = -0.469; P = 0.030). Heritability analysis suggests that clusters share, in part, a common genetic basis with biomarkers.

To identify the heritability enrichment in functional elements across cluster, we applied a stratified LDSC to test for annotation-specific heritability enrichment. Partitioning of functional genomic elements showed enrichment of heritability in regulatory elements, including CpG for cluster 1, histone H3 lysine 27 acetylation (H3K27ac) for cluster 8, 5' UTR for cluster 5, 3' UTR for cluster 24, transcription start site (TSS) for cluster 16, and super-enhancers for clusters 10 and 27 (Fig. 3d). Functional enrichment analysis revealed a significant contribution of conserved and regulatory regions to AIS.

Colocalization of biomarker quantitative trait locus with GWAS risk loci identified

Genetic effects on circulating biomarkers may offer novel insights into the mechanisms underlying the genetics of molecular clusters. Through colocalization with clusters, biomarker quantitative trait locus (QTLs) may help identify causal genes and disease pathways. We obtained GWAS summary statistics for 63 biomarkers and 30 clusters and scanned all genome-wide significant GWAS loci overlapping in our results (Fig. 4). At a more relaxed genome-wide significant threshold (P < 1 × 10−5), nine GWAS loci with high support (PP4 > 0.8) and five with medium support (0.5 < PP4 ≤ 0.8) were identified in the colocalization between 63 circulating biomarkers and 30 molecular clusters.

Fig. 4figure 4

Genetics of 30 molecular clusters and 63 biomarkers. Circos plot showing the genome-wide significant variants for 30 molecular clusters (P < 1 × 10−5) and 63 biomarkers (P < 5 × 10−8). Each dot corresponds to a trait-associated locus, while each radial line connects dots for colocalization of biomarker-QTLs with cluster-GWAS risk loci

Among the high- and medium-confidence (PP4 > 0.8 and PP4 > 0.5, respectively) colocalization results, we identified the genetic effects of circulating biomarkers and relevant molecular clusters. For example, rs1801133 (MTHFR) was identified as a candidate causal variant that explained the shared association signal between cluster 9 and HCY. In addition, we observed more than one colocalized QTL converging on multiple biomarkers for a molecular cluster. The pleiotropic loci included, for example, rs111784051(TOMM40) and rs651821 (APOA5) loci, both of which were associated with the serum levels of TG and Apo-E, and colocalized with GWAS signals for cluster 19 (marked by LDL-R, Apo-E, and triglyceride). Similarly, rs10929285 (AC114812) and rs28946889 (UGT1A1) both regulated the serum levels of indirect bilirubin (IBIL) and direct bilirubin (DBIL) and colocalized with GWAS signals for cluster 27 (marked by IBIL and DBIL) (Fig. 3e and Supplementary Table 15).

TWAS-significant genes associated with NCIS subpopulations

To gain additional insights into the genetic basis underlying molecular clusters and search for driver-mediating genes that could act as candidate therapeutic targets, we used transcriptome-wide association studies (TWAS) to predict tissue-specific gene expression levels for molecular clusters and biomarkers. For 30 molecular clusters, we identified 39 TWAS signals for gene expression levels that were significantly associated with one or more of the tissues (P < 1.0 × 10−7), particularly in the central nervous and cardiovascular systems (Supplementary Table 16, Supplementary Fig. 4a). We also found that transcriptomic profiles of whole blood provided supporting evidence for the role of specific genes that modulated multiple clusters. The strongest signal was observed in TSG101, which encodes a component of the ESCRT-I complex required for the sorting of endocytic ubiquitinated cargos into multivesicular bodies (cluster 26: PTWAS = 3.52 × 10−16; cluster 24: PTWAS = 1.22 × 10−9; cluster 30: PTWAS = 9.45 × 10−9).

To explore the genetic overlaps between the TWAS results for clusters and biomarkers, we investigated 183 TWAS signals for 63 biomarkers (P < 5.0 × 10−8). RC3H2 expression was associated with cluster 21 and IL-1Ra (cluster 21: PTWAS = 1.97 × 10−15; IL-1Ra: PTWAS = 2.38 × 10−16); RC3H2 may play an important role in inflammation.14 KCNJ13 expression was associated with both cluster 27 and biomarkers for the liver function index (cluster 27: PTWAS = 2.5 × 10−10; TBIL: PTAS = 2.87 × 10−65; DBIL: PTWAS = 2.25 × 10−25; IBIL: PTWAS = 1.23 × 10−56). KCNJ13, which encodes an inwardly rectifying potassium channel protein, is associated with the risk of coronary artery disease.15 RELT expression was associated with cluster 19 and LDL-R, triglyceride (cluster 19: PTWAS = 3.28 × 10−8; LDL-R: PTWAS = 1.23 × 10−19; Triglyceride: PTWAS = 4.78 × 10−14). The results revealed the GWAS loci that may directly affect gene expression and contribute to both biomarkers and subpopulations.

Target gene identification through eQTL colocalization of TWAS signals provides evidence of causality

We performed TWAS fine-mapping using FOCUS to prioritize putative causal genes that drive the clusters, which were used to compute posterior inclusion probability (PIP) and estimate credible sets for genes at each TWAS region and relevant tissue types. We identified colocalized expression quantitative trait loci (eQTLs) in 34 regions, providing suggestive evidence of causal genes in molecular clusters. These eQTLs may provide insights into the biological pathways in and drug targets for NCIS populations. We identified several genes encoding potential drug targets from human protein atlas database, with variants that influenced the identified subpopulations. For example, the likely causal variant CDK10 (PIP = 0.597) was associated with cluster 8, while the likely causal variant ERCC3 (PIP = 0.669) was associated with cluster 14, and CHEK2 (PIP = 0.774) was associated with cluster 21 (Table 2).

Table 2 Causal posterior probabilities for genes in 90% credible sets for clustersLandscape of NCIS subpopulations

Nine clusters with higher risk of stroke recurrence, mortality, or unfavorable functional outcomes when compared with other clusters were defined as high-risk subpopulations, while eight clusters with lower risk of stroke recurrence, mortality, or unfavorable functional outcomes when compared with other clusters were defined as low-risk subpopulations (Table 1). We mapped the phenotypic and genetic characteristics of these subpopulations to reveal their molecular landscapes and identify potential therapeutic targets for NCIS. The characteristics of several typical subpopulations are as follows:

High-risk subpopulationsCluster 1—inflammation

Cluster 1 was characterized by inflammation (hs-CRP). Patients in cluster 1 had a high prevalence of a history of stroke (39.6%), symptomatic intracranial atherosclerotic stenosis (sICAS) (39.6%), and pulmonary infection (37.7%) and a significantly higher risk of stroke recurrence (adjusted hazard ratio (HR) 2.601, 95% CI 1.467-4.611, P = 0.001), mortality (adjusted HR 5.495, 95% CI 2.695–11.204, P < 0.001) within 1 year as well as poor functional outcome within 3 months (adjusted HR 4.479, 95% CI 2.237–7.214, P < 0.001). ADAMTS9-AS2 (rs4688534, P = 2.52 × 10−58) was associated with cluster 1, which has been reported to be associated with infarct size, immune infiltration, and poor survival.16,17 In addition, PDPN explained most of the signal in its region (lead SNP PGWAS = 3.93 × 10−6; conditioned on PDPN lead SNP PGWAS = 0.118) (Supplementary Fig. 4b). For the genomic locus 1:12779560-1:14890557, PDPN was included in the 90% credible gene set, with a posterior probability of 0.756 (Table 2).

Cluster 2—D-dimer

Cluster 2 was characterized by a high D-dimer level, which suggested hypercoagulable states and increased thrombosis risk. The patients in cluster 2 had a high prevalence of deep venous thrombosis (5.7%), sICAS (41.4%) and symptomatic extracranial atherosclerotic stenosis (sECAS) (7.1%), a higher risk of mortality within 1 year (adjusted HR 4.867, 95% CI 2.394–9.895, P < 0.001), and poor functional outcome within 3 months (adjusted HR 3.743, 95% CI 2.300–6.347, P < 0.001). We identified 23 independent loci associated with cluster 2 at a genome-wide significance threshold of P < 1 × 10−5. GWAS loci (ICA1L-WDR12-CARF-NBEAL1) on chromosome 2 (rs10190966, lead SNPGWASP = 1.35 × 10−7, lead SNPGWAS = 0.13) explained 0.918 of the variance in atrial appendages (Supplementary Fig. 4c). The ICA1L-WDR12-CARF-NBEAL1 locus was also found to be associated with lacunar stroke.18

Cluster 7—renal function

Cluster 7 was characterized by renal function (cystatin C and creatinine) levels and had a high prevalence of a history of stroke (32.7%), sICAS (33.34%), hypertension (86.9%), CHD (24.8%), urinary tract infection (15%), and renal insufficiency (0.7%). The patients in cluster 7 had a significantly higher risk of stroke recurrence (adjusted HR 1.6491, 95% CI 1.111–2.448, P = 0.013) and mortality (adjusted HR 2.433, 95% CI 1.400–4.227, P = 0.002) at 1 year and a high risk of poor functional outcome within 3 months (adjusted HR 2.014, 95% CI 1.052–2.312, P = 0.027) when compared with other clusters. We found that a non-coding transcript variant of the protein-coding gene TMEM43 (rs6798807, p = 4.9 × 10−8) was significant in cluster 7. For the GWAS loci on chromosome 3, conditioning of TMEM43 explained 0.924 of the variance (rs3796308 lead SNPGWASP = 4.9 × 10−8, lead SNPGWAS = 0.133) (Supplementary Fig. 4d). For the genomic locus 3:13070799-3:14816745, FOCUS showed that TMEM43 was included in the 90% credible gene set with a posterior probability of 0.704 in whole blood (Table 2).

Cluster 26—lipid metabolism

Cluster 26 was characterized by lipid metabolism (LDL, triglyceride). Patients in cluster 26 had a significantly higher risk of disability (adjusted HR 1.836, 95% CI 1.117–2.522, P = 0.013). C8orf74 was associated with neurodevelopmental disorders (rs77073793, SIFT score = 0.04, PolyPhen-2 score = 0.833, possibly damaging) and could affect protein function in cluster 26. The genes in cluster 26 were enriched for cell-cell adhesion (GO: 0098609) and G-protein coupled receptor (GPCR) downstream signaling (R-HSA-388396). GPCRs expressed on the surface of platelets play key roles in the regulation of platelet activity and function. Pharmacological blockade of these receptors, including P2Y1 and P2Y12, can help to prevent arterial thrombosis.19 Aspirin was effective in reducing the risk of unfavorable functional outcomes in patients in cluster 26 (Fig. 2i, j).

For the genomic locus 8:10463197-8:11278541, NEIL2 was included in the 90% credible gene set, with a posterior probability of 0.594 in the brain_amygdala. NEIL2 is associated with DNA repair; the capacity for DNA repair is likely to be one of the factors that determines the neuronal vulnerability to ischemic stress and may influence the pathological outcome of stroke.20 The upregulated genes in cluster 26 showed a hallmark of Notch signaling, an important mediator of hepatic lipid metabolism and the remodeling of blood vessels.

Low-risk subpopulationsCluster 10—folic acid

Patients in cluster 10 (identified by folic acid) had a significantly lower risk of disability (HR 0.541, 95% CI 0.311–0.879, P = 0.014). RC3H2 expression was associated with cluster 10 based on its expression in the colon_sigmoid (PTWAS = 8.15 × 10−23), while TREH expression was associated with cluster 10 based on its expression in the colon_sigmoid (PTWAS = 2.26 × 10−12) and adipose_subcutaneous (PTWAS = 7.18 × 10−10). For the genomic locus 1:11778084-1:12778482, C1orf127 was included in the 90% credible gene set, with a posterior probability of 0.754 in the nerve_tibial.

Cluster 16—apolipoprotein A

Patients in cluster 16 (identified by Apo-AI and Apo-AII) had a relatively low risk of stroke recurrence at 3 months (HR, 0.119; 95% CI 0.017–0.850, P = 0.034). The expression of MTMR14 (myotubularin-related protein 14), which is associated with lipid phosphatase, was associated with cluster 16 based on its expression in the heart_left_ventricle (PTWAS = 7.83 × 10−8) and whole blood (PTWAS = 7.93 × 10−8). For GWAS loci on chromosome 3, conditioning in MTMR14 explained 0.349 of the variance (rs7618350 lead SNPGWASP = 1.86 × 10−7, lead SNPGWAS = 2.60 × 10−5). For the genomic locus 12:73818454-12:76511314, NAP1L1 was included in the 90% credible gene set, with a posterior probability of 0.535 in the brain_hippocampus.

Cluster 24—gut microbiota metabolism

Cluster 24 was characterized by gut microbial metabolites (TMAVA and TML) and had a relatively low risk of poor functional outcome (HR 0.675, 95% CI 0.493–0.808, P < 0.001). We found that CRELD2, which plays a role in calcium ion binding activity and protein disulfide isomerase activity (rs12170409, P = 2.25 × 10−8), was significantly associated with cluster 24. For the genomic locus 22:49825112-22:51240820, CRELD2 was included in the 90% credible gene set with a posterior probability of 0.503 in adipose tissue. The genes in cluster 24 were enriched for the regulation of the Wnt signaling pathway (GO: 0016055).

Biomarkers and SNP profiling for predicting clinical outcomes across the NCIS subpopulations

To provide potential aid for clinicians, we examined the applicability of biomarkers and SNP panels in predicting stroke recurrence and poor functional outcomes, respectively. The top 500 SNPs associated with clinical outcomes and 63 biomarkers were selected to develop the prediction models. Using these 63 circulating biomarkers, the estimated receiver operating characteristic area under the curve (ROC AUC) of stroke recurrence at 1 year was 0.59, while that of poor functional outcome at 3 months was 0.72. When using SNP profiling, the model performance increased to 0.79 ± 0.02 in predicting the risk of stroke recurrence, with an AUC of 0.78 ± 0.02 in predicting the risk of poor functional outcome. The SNP-based prediction models showed a large improvement in the predictive ability compared with the biomarker-based prediction models. As depicted in density plots and UMAPs, the predicted probability of stroke recurrence and poor functional outcome were diverse across the 30 clusters. There was a high proportion of patients with high predicted probability of stroke recurrence and poor functional outcome in cluster 1 (identified by hs-CRP), cluster 2 (identified by D-dimer), and cluster 5 (identified by UMA) (Fig. 5).

Fig. 5figure 5

Prediction models for clinical outcomes across the subpopulations. a Area under the ROC curve (AUC-ROC) analysis was used to predict stroke recurrence at 1 year using top SNPs and biomarkers. UMAP labeled by (b) predicted probability and (c) incidence of stroke recurrence at 1 year. d Density plots depicting the predicted probability of stroke recurrence at 1 year across the subpopulations. e Area under the ROC curve (AUC-ROC) estimates for the prediction of poor functional outcome (mRS > 2) at 3 months using top SNPs and biomarkers. UMAP labeled by the (f) predicted probability of poor functional outcome and (g) incidence of poor functional outcome (mRS > 2) at 3 months. h Density plots depicting the predicted probability of poor functional outcome (mRS > 2) at 3 months across the subpopulations

留言 (0)

沒有登入
gif