Natural Language Processing for Clinical Laboratory Data Repository Systems: Implementation and Evaluation for Respiratory Viruses

Abstract

Background: With the growing volume and complexity of laboratory repositories, it has become tedious to parse unstructured data into structured and tabulated formats for secondary uses such as decision support, quality assurance, and outcome analysis. However, advances in Natural Language Processing (NLP) approaches have enabled efficient and automated extraction of clinically meaningful medical concepts from unstructured reports. Objective: In this study, we aimed to determine the feasibility of using the NLP model for information extraction as an alternative approach to a time-consuming and operationally resource-intensive handcrafted rule-based tool. Therefore, we sought to develop and evaluate a deep learning-based NLP model to derive knowledge and extract information from text-based laboratory reports sourced from a provincial laboratory repository system. Methods: The NLP model, a hierarchical multi-label classifier, was trained on a corpus of laboratory reports covering testing for 14 different respiratory viruses and viral subtypes. The corpus included 85k unique laboratory reports annotated by eight Subject Matter Experts (SME). The model's performance stability and variation were analyzed across fine-grained and coarse-grained classes. Moreover, the model's generalizability was also evaluated internally and externally on various test sets. Results: The NLP model was trained several times with random initialization on the development corpus, and the results of the top ten best-performing models are presented in this paper. Overall, the NLP model performed well on internal, out-of-time (pre-COVID-19), and external (different laboratories) test sets with micro-averaged F1 scores >94% across all classes. Higher Precision and Recall scores with less variability were observed for the internal and pre-COVID-19 test sets. As expected, the model's performance varied across categories and virus types due to the imbalanced nature of the corpus and sample sizes per class. There were intrinsically fewer classes of viruses being detected than those tested; therefore, the model's performance (lowest F1-score of 57%) was noticeably lower in the detected cases. Conclusions: We demonstrated that deep learning-based NLP models are promising solutions for information extraction from text-based laboratory reports. These approaches enable scalable, timely, and practical access to high-quality and encoded laboratory data if integrated into laboratory information system repositories.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study was a collaborative effort supported by the Vector Institute, an independent, not-for-profit corporation dedicated to research in the field of artificial intelligence (AI), and ICES, an independent, non-profit research organization that uses population-based health and social data to produce knowledge on a broad range of health care issues. Resources used in preparing this work were funded by an annual grant from the Ontario Ministry of Health (MOH) and the Ministry of Long-Term Care (MLTC). Parts of this material are based on data and information compiled and provided by the Ontario Ministry of Health. This work was also supported by a SickKids-Canadian Institutes of Health Research New Investigator Grant in Child and Youth Health (NI19-1065). The analyses, conclusions, opinions, and statements expressed herein are solely those of the authors and do not reflect those of the funding or data sources; no endorsement is intended or should be inferred.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Projects that use data collected by ICES under section 45 of Ontario's Personal Health Information Protection Act (PHIPA), and use no other data, are exempt from research ethics board review. The use of the data in this project is authorized under section 45 and approved by ICES' Privacy and Legal Office. ICES is a prescribed entity under PHIPA. Section 45 of PHIPA authorizes ICES to collect personal health information, without consent, for the purpose of analysis or compiling statistical information concerning the management of, evaluation or monitoring of, the allocation of resources to or planning for all or part of the health system.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

The data underlying this work is held securely in the coded form at ICES and, therefore, cannot be shared publicly due to data privacy concerns and legal data sharing agreements between ICES and data providers (e.g., healthcare organizations and government). However, data access might be granted to those who meet prespecified criteria for confidential access, available at www.ices.on.ca/DAS (email das@ices.on.ca).

https://www.ices.on.ca/Data-and-Privacy/ICES-data

留言 (0)

沒有登入
gif