Identifying erroneous height and weight values from adult electronic health records in the All of Us research program

Electronic Health Records (EHR) data are a valuable source for research, but they often contain errors that can compromise the reliability of clinical research.[1], [2] Common physical measurements like height and weight are frequently used in research, but data accuracy is key. Erroneous data are a major issue in EHR databases.[3], [4], [5], [6] Daymont et al. found incorrect values among 4.5 % of height measures and 3.8 % of weights in a study performed on 2 million pediatric patients.[7] A manual review by a clinician is a reliable error-checking method but can be costly and subject to errors itself.[5] Considering the massive amount of data in EHR repositories, manual detection of all errors is unfeasible.

Ideally, errors within height and weight measurements in EHRs would be identified automatically. Several such algorithms exist, with their focus depending on error types and target populations. Previous research has flagged implausible height values using deviations from a preset cutoff and detected weight errors by calculating relative differences in measurements over time.[8], [9], [10] Khan's approach considered unit correction across multiple sites, addressing inconsistent, missing, or incorrect unit usage.[11] However, identifying erroneous inliers—values plausible within the population but implausible within an individual trajectory—remains challenging. These inliers, often undetected by existing methods, can compromise data analysis.[12] Even after applying the World Health Organization (WHO)'s criteria for implausible values, Phan et al. found a persistent 3 % error rate.[13] Daymont et al. combined extreme outlier and adaptive inlier detection for error identification within pediatric growth charts.[7] This algorithm was recently extended to accommodate adult data.[14] However, due to the lack of a reference chart for adults, the extension was based on raw measurement values despite the demonstrated benefit of using reference charts in Daymont et al. [7].

The All of Us Research Program (All of Us) collects raw EHR physical measurements, including height and weight.[15] While a rule-based cleaning algorithm was applied to the raw values to remove errors, it did not utilize individual profiles to identify potential inlier errors.[11] This paper extended Daymont et. al.’s algorithm to detect erroneous inliers within individual trajectories for an adult population in All of Us. Our contributions are multifold: (1) Unlike Daymont et al. who used existing growth charts for pediatrics, we leveraged unbalanced longitudinal EHRs to establish the reference chart for adults. This expands the method's applicability, enabling possible methods of error detection for other vital measurements within EHRs such as body mass index (BMI) and waist-to-hip ratio. (2) To better fit the statistical nature of our data, we extended the standard deviation score to accommodate the additional kurtosis parameter observed in the data. This is a significant statistical enhancement that allows for better handling of extreme values or outliers, thus improving the reliability of our model's results.[16].

留言 (0)

沒有登入
gif