Objective: To address the challenges in for modeling time-to-event outcomes in small-sample settings by leveraging transfer learning techniques while accounting for potential covariate and concept shifts between source and target datasets. Methods: We propose a novel transfer learning approach, termed CoxTL, for modeling time-to-event data based on the widely used Cox proportional hazards model. CoxTL utilizes a combination of density ratio weighting and importance weighting techniques to address multi-level data heterogeneity, including covariate and coefficient shifts between source and target datasets. Additionally, it accounts for potential model misspecification, ensuring robustness across a wide range of settings. We assess the performance of CoxTL through extensive simulation studies, considering data under various types of distributional shifts. Additionally, we apply CoxTL to predict End-Stage Renal Disease (ESRD) in the Hispanic population using electronic health record-derived features from the All of Us Research Program. Data from non-Hispanic White and non-Hispanic Black populations are leveraged as source cohorts. Model performance is evaluated using the C-index and Integrated Brier Score (IBS). Results: In simulation studies, CoxTL demonstrates higher predictive accuracy, particularly in scenarios involving multi-level heterogeneity between target and source datasets. In other scenarios, CoxTL performs comparably to alternative methods specifically designed to address only a single type of distributional shift. For predicting the 2-year risk of ESRD in the Hispanic population, CoxTL achieves increase in C-index up to 6.76% compared to the model trained exclusively on target data. Furthermore, itdemonstrates up to 17.94% increase in the C-index compared to the state-of-the-art transfer learning method based on Cox model. Conclusion: The proposed method effectively utilizes source data to enhance time-to-event predictions in target populations with limited samples. Its ability to handle various sources and levels of data heterogeneity ensures robustness, making it particularly well-suited for real-world applications involving target populations with small sample sizes, where traditional Cox models often struggle. Keywords: Cox model, distributional shift, data integration, transfer learning.
Competing Interest StatementThe authors have declared no competing interest.
Funding StatementThis study was funded by NIH 1R01GM148494 and 1R01CA296289.
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Institutional Review Board of All of Us waived ethical approval for this work.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
留言 (0)