Validation of a Zero-shot Learning Natural Language Processing Tool to Facilitate Data Abstraction for Urologic Research

The advent of electronic health records (EHRs) has transformed urologic research by offering access to a vast amount of patient data. An essential part of EHRs are free-text fields. Unlike their structured counterparts, free-text fields offer a nuanced and comprehensive perspective on individual patient cases, capturing a depth of clinical information that is often not found in structured data. With that said, the use of unstructured data in urologic research is fraught with challenges [1], [2], [3]. Chief among them is the labor-intensive task of data extraction. Additionally, the lack of standardization in free-text entries, due to their subjective and individualistic nature, complicates data aggregation and comparison across different records or health care providers. Moreover, inconsistencies in the quality and completeness of data further heighten this challenge for researchers. This has prompted the need for sophisticated natural language processing (NLP) techniques for data abstraction, but this has the downside of introducing the potential for misinterpretation or omission of vital information [4], [5], [6], [7].

Recent advancements in artificial intelligence technologies, particularly the introduction of large language models (LLMs) with zero-shot learning capabilities, offer a promising solution to the abstraction of unstructured health care data [8], [9], [10], [11], [12]. Zero-shot learning is a concept in artificial intelligence/machine learning where a model is able to accurately classify data into categories that it has not encountered previously during its training phase. Thus, zero-shot learning makes it possible to forego the highly technical and time-consuming work of first training a model for a given data abstraction task.

Herein, we describe the development of an LLM-based tool that utilizes zero-shot learning to abstract unstructured data contained within Portable Document Format (PDF) files. We hypothesize that this tool will enhance user time efficiency for data abstraction without a significant decrease in accuracy. To test this hypothesis, we compared our tool's performance with that of three humans tasked with abstracting a set of discrete variables from radical prostatectomy pathology reports.

留言 (0)

沒有登入
gif