Improvement of intervention information detection for automated clinical literature screening during systematic review

Elsevier

Available online 26 August 2022, 104185

Journal of Biomedical InformaticsHighlights•

We design an algorithm to automatically label interventions in PubMed abstracts.

Sentence-level confidence in negative labels is used as weights in loss function.

We improved significantly the performance of the intervention detection model.

Abstract

Systematic literature review (SLR) is a crucial method for clinicians and policymakers to make their decisions in a flood of new clinical studies. Because manual literature screening in SLR is a highly laborious task, its automation by natural language processing (NLP) has been welcomed. Although intervention is a key information for literature screening, NLP models for its detection in previous works have not shown adequate performance. In this work, we first design an algorithm for automated construction of high-quality intervention labels by utilizing information retrieved from a clinical trial database. We then design another algorithm for improving model’s recall and F1 score by imposing adaptive weights on training instances in the loss function. The intervention detection model trained on the weighted datasets is tested with the Evidence-Based Medicine NLP (EBM-NLP) corpus, and shows 9.7% and 4.0% improvements respectively in recall and F1 score compared to the previous state-of-the-art model on the corpus. The proposed algorithms can boost automation of literature screening during SLR in the clinical domain.

AbbreviationsSLR

Systematic Literature Review

RCT

Randomized Controlled Trial

AUROC

Area Under the Receiver Operating Characteristics curve

EBM-NLP

Evidence Based Medicine-Natural Language Processing

PICO

Population Intervention Comparator Outcome

LSTM-CRF

Long Short-Term Memory-Conditional Random Field

BERT

Bidirectional Encoder Representations from Transformers

NER

Named Entity Recognition

MeSH

Medical Subject Headings

Keywords

Clinical literature screening

Systematic review

Natural language processing

Intervention information detection

Data and code availability

The datasets used in this study are all publicly available from the following sources: PubMed (https://pubmed.ncbi.nlm.nih.gov/), ClinicalTrials.gov (https://clinicaltrials.gov/), MeSH (https://www.ncbi.nlm.nih.gov/mesh/), EBM-NLP (https://ebm-nlp.herokuapp.com/). The python scripts for reproducing results in this work, as well as the AL data, are available at Mendeley Data.

© 2022 The Author(s). Published by Elsevier Inc.

留言 (0)

沒有登入
gif