To Improve a Prediction Model, Give it Time*

“Time reveals all” the old saying goes. In the PICU, time invariably curbs the unease of uncertainty, although clinicians at the bedside often find time moves too slow or is lacking altogether when faced with important management decisions. Instead, diagnostic workups and treatments are guided by the available data and continuously adjusted as a patient’s course unfolds. This interplay between time and available data is something that is readily apparent to any practicing clinician but has been incompletely captured in many of the existing assessment tools, decision rules, and population risk models. The Pediatric Risk of Mortality score (PRISM), Pediatric Index of Mortality (PIM), Pediatric Logistic Organ Dysfunction scores (PELOD), and pediatric Sequential Organ Failure Assessment (pSOFA), to name a few widely employed mortality models, all rely on data collected during discrete windows of time, most often defined by the initial portion of a PICU stay (1–4). Accordingly, these scores serve as static representations of evolving systems and are not designed to capture information that accrues beyond their time boundaries.

This construct of what we can call “traditional” severity of illness and mortality prediction models was intentional. Each score was developed in an era when data collection and curation were much more labor-intensive than what is now possible with modern information systems. Furthermore, the traditional models were never found to be widely applicable at the bedside. PRISM and PIM, for example, are best known as tools for adjusting the starting point of a PICU admission to facilitate single and multicenter, population-level benchmarking initiatives. PELOD was devised as an outcome, and it is closely related derivative pSOFA can be used similarly. These scores have tremendous utility for assessing unit performance by way of standardized mortality ratios, incorporation into multivariable models in observational datasets to adjust for cohort illness severity, and as outcomes corresponding to average treatment effects in clinical trials aiming to reduce the occurrence of multiple organ dysfunction. Such scores also have substantial limitations. Their use in individual, patient-level characterization or prognostication is ethically dubious, and real-world examples abound of the shortcomings of scores used in this manner.

In this issue of Pediatric Critical Care Medicine, Akhondi-Asi et al (5) aim to demonstrate that mortality prediction is improved by incorporating sequential measurements of pSOFA, compared with the calculation of admission day pSOFA alone. Using data from a single, freestanding children’s hospital, pSOFA scores were calculated daily for up to 30 days for 9146 patients previously admitted to the PICU. The authors then constructed joint models incorporating the daily pSOFA scores alongside other potentially meaningful indicators of illness state and severity, including PIM3 score, PICU day, and interactions between the included variables. Joint models are statistical models that incorporate 2 or more types of relevant data, such as longitudinal data (repeated measurements over time) and time-to-event data (survival data) (6). The approach offers several potential advantages, including the ability to better account for encounter-specific variations related to individual patient characteristics and improved representation of complex interactions between different data types resulting in better predictive performance. Joint models were constructed in the present work with varied data windows until day 30, including data until days 1, 3, 5, etc. Model performance was assessed with an outcome of PICU day 30 survival using the dynamic area under the curve (AUC) metric. Dynamic AUC differs from the more common area under the receiver operating curve (AUROC) in that it assesses predictive performance over time and is intended for use when predictive performance may change over time. Akhondi-Asi et al (5) report that the prediction accuracy of joint models was improved by an average of 6.4% (interquartile range [IQR] 6.3%, 6.6%) compared with day 1 pSOFA by incorporating data for the first 3 days of PICU admission and improved an average of 9.2% (IQR 9.0%, 9.5%) by incorporating the first 5 days of PICU data.

The authors posit that such models incorporating temporal data may prove useful at the bedside by incorporating time-dependent information that better represents the evolution of a patient’s condition or as an entry criterion to enrich clinical trials. The authors also offer that the improved predictive performance might allow the models to be incorporated into resource allocation schemes or to guide policy decisions. These are reasonable propositions, but the authors are careful to call for further validation and additional multicenter studies before deploying these models into the real world. It is worth heeding this caution. As increasingly sophisticated modeling strategies start to make their way into clinical practice, several recent lessons related to the application of the adult SOFA score should serve as reminders regarding the limitations of predictive models in decision-making that affects patients.

During the COVID pandemic, for example, the adult SOFA score was incorporated into resource rationing programs while, almost in parallel, a body of literature emerged highlighting intrinsic bias in the SOFA score (7). The application of SOFA for COVID-19 triage ultimately proved problematic for several reasons. First, the SOFA score had been contemporaneously validated in adult patients with suspected bacterial infection and the statistical performance characteristics of the SOFA score in patients with COVID-19 had not been thoroughly assessed when it was reached for a method of prognostication during the pandemic (8). Two, depending on the workflows in which SOFA was implemented, the boundaries of score calculation were also at potentially different timepoints in the context of an individual’s course of illness compared with the score’s original design. Three, some SOFA calculations relied on pulse oximetry to calculate Spo2/Fio2 ratios when Pao2/Fio2 ratios were unavailable; pulse oximetry has been the focus of increasing scrutiny as a source of race-related measurement bias (9,10). Fourth, SOFA’s place as a robust prediction model is predicated on its strong performance characteristics as measured by the AUROC, which does not necessarily translate to robust bedside predictive performance. AUROC is a well-established metric for assessing model discrimination in a population but does not reflect positive predictive value (PPV) as a tradeoff with sensitivity. The area under the precision-recall curve is the metric meant to convey PPV across the range of score thresholds. Fifth, the components of SOFA do not comprehensively apply to all disease processes or potential patient strata. Age, for example, is an important prognostic variable not considered in the SOFA score alone and may be more predictive of mortality than SOFA among patients with COVID-19 (11). One simulation noted that the use of SOFA score in pandemic resource triage for patients with respiratory failure led to reallocating ventilators to patients with a lower likelihood of survival (12).

The work by Akhondi-Asi et al (5) represents an important step in addressing the shortcomings of traditional mortality models as they relate to bedside workflows. As clinical investigators become increasingly facile with curating and analyzing real-world data, one immediate question might be whether working within the framework of scores such as SOFA or pSOFA is unnecessarily confining? More information might be gained by assessing how the dynamic, temporal interactions amongst dozens or more unprocessed clinical variables coalesce in a model that better predicts outcomes. At the same time, maturing informatics infrastructure across the world can be expected to generate the temptation that more advanced mathematical models, incorporated into the increasingly powerful computational capabilities of bedside technologies, will address the types of shortcomings emblemized by the adult SOFA during the pandemic. However, recent experiences with adult SOFA during the pandemic should serve as a forewarning of the dangers of supplanting human decision-making with mathematical calculations, not as simply representing the need for more advanced models. A careful examination of any model’s potential intrinsic biases must also be performed before deployment, statistical performance characteristics should be readily available to bedside clinicians, and mechanisms for explainability should be available to aid interpretation of model output (13).

The fundamental theorem of biomedical informatics puts forward that a person working with an information resource is better than the same person working alone (14). Advanced decision-support tools like predictive models incorporating different data types and temporal information have the potential to augment clinical decision-making and improve patient outcomes. However, models should never replace human decisions. The meticulous approach called for by Akhondi-Asi et al (5) should serve as a roadmap for others working on similar efforts. “In acute diseases it is not quite safe to prognosticate either death or recovery” cautioned Hippocrates more than 2 millennia ago (15). The field of medicine is hurtling toward a future of ever more ubiquitous advanced analytics and artificial intelligence. Modern clinicians and data scientists will do well if they keep Hippocrates’ prescient guidance in the forefront of their minds as they work to develop new approaches to deal with the uncertainty inherent to the practice of medicine.

1. Pollack MM, Ruttimann UE, Getson PR: Pediatric risk of mortality (PRISM) score. Crit Care Med. 1988; 16:1110–1116 2. Shann F, Pearson G, Slater A, et al.: Paediatric index of mortality (PIM): A mortality prediction model for children in intensive care. Intensive Care Med. 1997; 23:201–207 3. Leteurtre S, Martinot A, Duhamel A, et al.: Validation of the paediatric logistic organ dysfunction (PELOD) score: Prospective, observational, multicentre study. Lancet. 2003; 362:192–197 4. Matics TJ, Sanchez-Pinto LN: Adaptation and validation of a pediatric sequential organ failure assessment score and evaluation of the sepsis-3 definitions in critically ill children. JAMA Pediatr. 2017; 171:e172352 5. Akhondi-Asi A, Geva A, Burns JP, et al.: Dynamic Prediction of Mortality Using Longitudinally Measured Pediatric Sequential Organ Failure Assessment Scores: A Joint Modeling Approach. Pediatric Crit Care Med. 2024; 25:443–451 6. Ibrahim JG, Chu H, Chen LM: Basic concepts and methods for joint models of longitudinal and survival data. J Clin Oncol. 2010; 28:2796–2801 7. Ashana DC, Anesi GL, Liu VX, et al.: Equitably allocating resources during crises: Racial differences in mortality prediction models. Am J Respir Crit Care Med. 2021; 204:178–186 8. Raith EP, Udy AA, Bailey M, et al.; Australian and New Zealand Intensive Care Society (ANZICS) Centre for Outcomes and Resource Evaluation (CORE): Prognostic accuracy of the SOFA score, SIRS criteria, and qSOFA score for in-hospital mortality among adults with suspected infection admitted to the intensive care unit. JAMA. 2017; 317:290–300 9. Sjoding MW, Dickson RP, Iwashyna TJ, et al.: Racial bias in pulse oximetry measurement. N Engl J Med. 2020; 383:2477–2478 10. Wong AKI, Charpignon M, Kim H, et al.: Analysis of discrepancies between pulse oximetry and arterial oxygen saturation measurements by race and ethnicity and association with organ dysfunction and mortality. JAMA Network Open. 2021; 4:e2131674 11. Raschke RA, Agarwal S, Rangan P, et al.: Discriminant accuracy of the SOFA score for determining the probable mortality of patients with COVID-19 pneumonia requiring mechanical ventilation. JAMA. 2021; 325:1469–1470 12. Walsh BC, Zhu J, Feng Y, et al.: Simulation of New York City’s ventilator allocation guideline during the spring 2020 COVID-19 surge. JAMA Network Open. 2023; 6:e2336736 13. Iy C, E P, S R, et al.: Ethical machine learning in healthcare. Ann Rev Biomed Data Sci. 2021: 4:123–144 14. Friedman CP: A “fundamental theorem” of biomedical informatics. J Am Med Inform Assoc. 2009; 16:169–170 15. The Internet Classics Archive | Aphorisms by Hippocrates. Available at: https://classics.mit.edu/Hippocrates/aphorisms.2.ii.html. Accessed February 4, 2024

留言 (0)

沒有登入
gif