Dynamic early warning scores for predicting clinical deterioration in patients with respiratory disease

Early warning scores are used to detect deteriorating patients in acute hospital settings, and are usually calculated by assigning scores to a number of clinical observations such as heart rate and respiratory rate, and adding these to produce a composite score [1,2,3]. The National Early Warning Score-2 (NEWS-2) is used throughout the UK and internationally [4]. NEWS-2 is simple enough to be used with paper observation charts and calculated by hand, but this may result in useful diagnostic information being lost, as the detailed granularity and temporal trends in clinical observations are not accounted for. For instance, NEWS-2 has only two categories for inspired oxygen, whereas it has been shown that incorporating the percentage of inspired oxygen into scoring systems improves their accuracy [5]. The increasing use of electronic recording of clinical observations raises the possibility of using more sophisticated scoring systems that make full use of the information content of current and previous observations. There is increasing interest in using advanced statistical methods to train and validate novel scoring systems, making use of large datasets of clinical observations [6].

The majority of early warning scoring systems have been developed and validated to predict intensive care unit (ICU) admission, cardiac arrest or death [1,2,3]. However, the purpose of an early warning score is to detect patients who require urgent intervention in order to prevent these adverse outcomes, rather than simply to predict them. There are few previous studies that have developed and validated an early warning score specifically to detect treatable conditions such as sepsis and respiratory failure. We therefore defined a novel outcome of clinically significant deterioration (CSD) requiring urgent treatment, and utilised this to develop and validate a novel early warning score.

We developed and internally validated dynamic early warning scores (DEWS) using a retrospective database of clinical observations in patients admitted under the care of adult respiratory medicine services. We hypothesised that DEWS would provide superior predictive accuracy compared to NEWS-2 in patients with respiratory disease, with respect to (1) death or ICU admission, occurring within 24 h (D/ICU), and (2) clinically significant deterioration requiring urgent treatment, occurring within 4 h (CSD).

Data source

The study population comprised adult patients (age ≥ 18 years) admitted between 1st April 2015 and 31st December 2020 who were under the care of respiratory medicine at the time of death or discharge from hospital. The majority of patients had an acute or chronic respiratory diagnosis although some general medical patients were also included if they were cared for on a respiratory ward.

Clinical observations for adult in-patients at Nottingham University Hospitals NHS Trust (NUH) have been recorded electronically using a wireless workflow tracking system since April 2015 as part of routine clinical care. Clinical observations data were extracted from the system for the study population. The data comprised date and time-stamped measurements of heart rate, respiratory rate, systolic blood pressure, temperature, oxygen saturations, inspired oxygen flow rate or concentration (FiO2), and level of consciousness recorded on a five-point ACVPU scale (Alert, Confused, responds to Voice, responds to Pain, Unresponsive). The NEWS-2 score was calculated according to current guidelines [4]. Patients in whom at least one observation set was labelled as “O2 sats scale 2 (chronic respiratory disease)” were considered to have chronic respiratory disease, with target oxygen saturations of 88–92%. Oxygen saturation Scale 2 was used to calculate NEWS-2 in these patients; Scale 1 was used for all other patients. The timing of death or ICU admission was also extracted from the system.

Clinically significant deterioration (CSD) definition

A subset of 1100 admission episodes were annotated manually by a Consultant Physician and senior Specialty Registrar (SG and SF) with reference to the medical notes. Clinically significant deterioration was defined as a specific event requiring a change in treatment. In order to ensure consistency within and between the case annotators, the types of event and treatments given were recorded using a standardised list, as shown in Additional file 1: Table S1. The list of event types and treatments was drafted based on clinical experience and previously published literature [7,8,9,10,11], and was finalised following preliminary annotation of 50 cases by the lead investigator (SG). Ten cases were reviewed jointly by SG and SF in order to agree a consistent approach to annotation. To maximise the number of events available for analysis, the cases chosen for annotation were those with a maximum NEWS-2 score of ≥ 10, or in which death or ICU admission occurred. In addition, since events with a low heart rate were uncommon in the dataset, all admission episodes with a minimum heart rate of ≤ 40 were annotated, to ensure sufficient training examples for this rare but important condition.

The dataset was anonymised prior to analysis by removing identifying information such as names, dates of birth and hospital identification numbers. The project was approved by the Nottingham 1 Research Ethics Committee (20/EM/0064) and the Confidentiality Advisory Group (20/CAG/0034).

Model development and validation

For the full dataset with the outcome of D/ICU, data from April 2015 to December 2019 were used for model training. Data from January to December 2020 were then extracted and used for validation. For the annotated dataset with the outcome of CSD, 829 randomly selected admission episodes were used for training and 271 for validation. Since the missing data rate was low (< 1% for each variable), the analysis was limited to complete cases and data imputation was not carried out. The first two observation sets from each admission episode were excluded from the analysis since (1) our primary aim was to detect de novo deterioration occurring during the admission rather than to stratify illness severity at the point of admission, and (2) a number of time series features included in the DEWS model required a minimum of three observation sets to calculate.

DEWS was developed using similar methodology to the previously published logistic early warning score (logEWS) [12] and Dynamic individual vital sign trajectory early warning score (DyniEWS) [13]. Since the level of inspired oxygen had mixed units of measurement (percentage inspired oxygen and flow rate in litres/minute) we created a new ordinal variable which encoded the level of inspired oxygen as None = 0, Low = 1, Low-moderate = 2, Moderate = 3, High = 4, and Very high = 5. Full details of this encoding are shown in Additional file 1: Table S2. Clinical observations with a U-shaped risk curve, in which both high and low values were associated with increased risk (heart rate, respiratory rate, systolic blood pressure and temperature) were split into separate variables for high and low values (see Additional file 1: Table S3). A number of time series features were extracted from the raw clinical observations data including: difference from the previous observation; average and standard deviation of the five (minimum of three) most recent observations; and categorisation of recent values into normal and stable, normal and unstable, outside normal range and stable, outside normal range and improving, or outside normal range and worsening. A total of 38 raw and engineered features were entered into logistic regression models, with L2 regularisation for feature selection, and tenfold stratified cross-validation. The output of the logistic regression models was the modelled probability of the outcome. All features were normalised to zero mean and unit variance prior to entry into the models. Separate DEWS were developed for the outcome of D/ICU in the full dataset and CSD in the annotated dataset. Further details of the DEWS models are given in the supplementary material (Additional file 1: Tables S2–S6).

The primary metric of model accuracy was the area under the receiver operating characteristic curve (AUROC). The area under the precision-recall curve (AUPRC) was also calculated since this is considered to be a more informative metric in unbalanced datasets with a large majority of negative cases [14]. Precision-recall curves are helpful in these cases as they give an intuitive understanding of how the precision (also known as the positive predictive value, the probability that a positive test result is a true positive) relates to the recall (or sensitivity) at different cut-points. Area under the curve values and 95% confidence intervals were calculated using 500 bootstrap samples. The sensitivity and specificity of NEWS-2 and DEWS were compared at cut-points corresponding to NEWS-2 scores of 5 and 7, since these are the key thresholds for an urgent or emergency response in current guidelines [4].

Sample size calculation

We used a previously published method [15] to calculate the required sample size for comparing the AUROC of two diagnostic tests, in order to determine how many cases needed to be manually annotated. The AUROC of NEWS for predicting in-hospital death or unplanned ICU admission is approximately 0.8 [5]. Assuming that the novel algorithm would improve this to 0.85, we calculated that 463 observation sets positive for the outcome would be needed in the validation dataset to detect this difference with 80% power. We estimated that this would be achieved if 250 admission episodes were included in the validation dataset. It is usually recommended that the training dataset is 2–4 times the size of the validation dataset, so we planned to annotate a further 750 admission episodes for the training dataset.

留言 (0)

沒有登入
gif