Purpose Pre–trained encoder transformer models have extracted information from unstructured clinic note text but require manual annotation for supervised fine–tuning. Large, Generative Pre–trained Transformers (GPTs) may streamline this process. In this study, we explore GPTs in zero– and few–shot learning scenarios to analyze clinical health records. Materials and Methods We prompt–engineered LLAMA2 13B to optimize performance in extracting seizure freedom from epilepsy clinic notes and compared it against zero–shot and fine–tuned Bio+ClinicalBERT models. Our evaluation encompasses different prompting paradigms, including one–word answers, elaboration–based responses, prompts with date formatting instructions, and prompts with dates in context. Results We found promising median accuracy rates in seizure freedom classification for zero-shot GPT models: one–word — 62%, elaboration — 50%, prompts with formatted dates — 62%, and prompts with dates in context — 74%. These outperform the zero–shot Clinical BERT model (25%) but fall short of the fully fine–tuned BERT model (84%). Furthermore, in sparse contexts, such as notes from general neurologists, the best performing GPT model (76%) surpasses the fine–tuned BERT model (67%) in extracting seizure freedom. Conclusion This study demonstrates the potential of GPTs in extracting clinically relevant information from unstructured EMR text, offering insights into population trends in seizure management, drug effects, risk factors, and healthcare disparities. Moreover, GPTs exhibit superiority over task–specific models in contexts with the potential to include less precise descriptions of epilepsy and seizures, highlighting their versatility. Additionally, simple prompt engineering techniques enhance model accuracy, presenting a framework for leveraging EMR data with zero clinical annotation.
Competing Interest StatementThe authors have declared no competing interest.
Funding StatementThis research was funded by the National Institute of Neurological Disorders and Stroke DP1NS122038; by the National Institutes of Health R01NS125137; the Mirowski Family Foundation; by contributions from Neil and Barbara Smit; and by contributions from Jonathan and Bonnie Rothberg. W.K.S.O. was supported by the National Science Foundation Research Grant Fellowship DGE-1845298. C.A.E. was supported by the National Institute of Neurological Disorders and Stroke of the National Institutes of Health Award Number K23NS121520. D.R. was partially funded by the Office of Naval Research Contract N00014-19-1-2620.
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The IRB of the University of Pennsylvania waived ethical approval for this work.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data AvailabilityAll data produced in the present study are unavailable to protect patient privacy.
留言 (0)