Background: To improve confounding control in healthcare database studies, data-driven algorithms may empirically identify and adjust for large numbers of pre-exposure variables that indirectly capture information on unmeasured confounding factors (proxy confounders). Current approaches for high-dimensional proxy adjustment do not leverage free-text notes from EHRs. Unsupervised natural language processing (NLP) technology can scale to generate large numbers of structured features from unstructured notes. Objective: To assess the impact of supplementing claims data analyses with large numbers of NLP generated features for high-dimensional proxy adjustment. Methods: We linked Medicare claims with EHR data to generate three cohorts comparing different classes of medications on the 6-month risk of cardiovascular outcomes. We used various NLP methods to generate structured features from free-text EHR notes and used LASSO regression to fit several PS models that included different covariate sets as candidate predictors. Covariate sets included features generated from claims data only, and claims data plus NLP-generated EHR features. Results: Including both claims codes and NLP-generated EHR features as candidate predictors improved overall covariate balance with standardized differences being <0.1 for all variables. While overall balance improved, the impact on estimated treatment effects was more nuanced with adjustment for NLP-generated features moving effect estimates further in the expected direction in two of the empirical studies but had no impact on the third study. Conclusion: Supplementing administrative claims with large numbers of NLP-generated features for ultra-high-dimensional proxy confounder adjustment improved overall covariate balance and may provide a modest benefit in terms of capturing confounder information.
Competing Interest StatementDr. Schneeweiss is participating in investigator-initiated grants to the Brigham and Womens Hospital from Boehringer Ingelheim and UCB unrelated to the topic of this study. He is a consultant to Aetion Inc., a software manufacturer of which he owns equity. His interests were declared, reviewed, and approved by the Brigham and Womens Hospital in accordance with their institutional compliance policies. All other authors declare no competing interests for this work.
Funding StatementThis project was funded by NIH RO1LM013204; additional funding was provided by PCORI ME-2022C1-25646.
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Mass General Brigham (MGB) Institutional Review Board gave ethical approval for this work.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data AvailabilityData used in the present study are not publicly available due to data use agreements.
留言 (0)