Derivation of a natural language processing algorithm to identify febrile infants

Background

Diagnostic codes can retrospectively identify samples of febrile infants, but sensitivity is low, resulting in many febrile infants eluding detection. To ensure study samples are representative, an improved approach is needed.

Objective

To derive and internally validate a natural language processing algorithm to identify febrile infants and compare its performance to diagnostic codes.

Methods

This cross-sectional study consisted of infants aged 0–90 days brought to one pediatric emergency department from January 2016 to December 2017. We aimed to identify infants with fever, defined as a documented temperature ≥38°C. We used 2017 clinical notes to develop two rule-based algorithms to identify infants with fever and tested them on data from 2016. Using manual abstraction as the gold standard, we compared performance of the two rule-based algorithms (Models 1, 2) to four previously published diagnostic code groups (Models 5–8) using area under the receiver-operating characteristics curve (AUC), sensitivity, and specificity.

Results

For the test set (n = 1190 infants), 184 infants were febrile (15.5%). The AUCs (0.92–0.95) and sensitivities (86%–92%) of Models 1 and 2 were significantly greater than Models 5–8 (0.67–0.74; 20%–74%) with similar specificities (93%–99%). In contrast to Models 5–8, samples from Models 1 and 2 demonstrated similar characteristics to the gold standard, including fever prevalence, median age, and rates of bacterial infections, hospitalizations, and severe outcomes.

Conclusions

Findings suggest rule-based algorithms can accurately identify febrile infants with greater sensitivity while preserving specificity compared to diagnostic codes. If externally validated, rule-based algorithms may be important tools to create representative study samples, thereby improving generalizability of findings.

留言 (0)

沒有登入
gif