Metastatic vs. Localized Disease As Inclusion Criteria That Can Be Automatically Extracted From Randomized Controlled Trials Using Natural Language Processing

Abstract

Background: Extracting inclusion and exclusion criteria in a structured, automated fashion remains a challenge to developing better search functionalities or automating systematic reviews of randomized controlled trials in oncology. The question 'Did this trial enroll patients with localized disease, metastatic disease, or both?' could be used to narrow down the number of potentially relevant trials when conducting a search. Methods: 600 trials from high-impact medical journals were classified depending on whether they allowed for the inclusion of patients with localized and/or metastatic disease. 500 trials were used to develop and validate three different models with 100 trials being stored away for testing. Results: On the test set, a rule-based system using regular expressions achieved an F1-score of 0.72 (95% CI: 0.64 - 0.81) for the prediction of whether the trial allowed for the inclusion of patients with localized disease and 0.77 (95% CI: 0.69 - 0.85) for metastatic disease. A transformer-based machine learning model achieved F1 scores of 0.97 (95% CI: 0.93 - 1.00) and 0.88 (95% CI: 0.82 - 0.94), respectively. The best performance was achieved by a combined approach where the rule-based system was allowed to overrule the machine learning model with F1 scores of 0.97 (95% CI: 0.94 - 1.00) and 0.89 (95% CI: 0.83 - 0.95), respectively. Conclusion: Automatic classification of cancer trials with regard to the inclusion of patients with localized and or metastatic disease is feasible. Turning the extraction of trial criteria into classification problems could, in selected cases, improve text-mining approaches in evidence-based medicine.

Competing Interest Statement

P.W. has a patent application titled "Method for detection of neurological abnormalities" outside of the submitted work. The remaining authors declare no conflict of interest.

Funding Statement

No funding was received for this project.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The study only used results from research published as journal articles.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

留言 (0)

沒有登入
gif