Development of machine learning models for the detection of surgical site infections following total hip and knee arthroplasty: a multicenter cohort study

In this population-based multicenter cohort study, we observed a modestly reduced incidence of SSIs following total hip and knee arthroplasty over the study period, in contrast to the findings reported in existing literature. The incidence of SSIs varied substantially across hospitals. We developed and evaluated nine machine learning models to identify SSIs from patient charts. The model that was developed using both structured and unstructured (nursing notes) data achieved the best performance. Applying these models has the potential to reduce the workload for chart reviews of traditional IPC surveillance programs.

Surveillance and reporting of SSIs are critically important to prevent and control healthcare-associated infections. Parameters such as data quality of different surveillance programs, postsurgical follow-up process and imperfect criteria potentially contribute to the discordance of reported incidence of SSIs in literature [22]. In our study, the SSI rates for TKA and THA were 0.52% and 0.5%, respectively. Comparatively, the CDC reported rates for TKA and THA were 0.65% and 0.4%, and the ECDC rates were 0.6% and 1.2%, respectively [4, 23]. While our study’s TKA and THA rates were slightly lower than the CDC and ECDC reported rates [2, 24, 25]. This finding is consistent with previously published studies [26]. The observed decrease in the incidence of SSIs throughout the study period might have resulted from uniform provincial surveillance initiated by the Alberta Health Services IPC program starting in March 2012 [27].

The detection of SSIs from large population-based cohorts is shifting from solely relying on the composition of ICD codes to a mixed-use of patient structured and unstructured data leveraging the advantages of machine learning techniques [11]. Clinical notes often contain valuable unstructured textual diagnoses and important clinical events, and have demonstrated enormous benefits for enhancing machine learning models` performance. For example, Bucher et al. developed a natural language processing approach using clinical notes to automate SSI surveillance [28]. As a result, they reached a sensitivity of 0.79 and ROC AUC of 0.852 in their external validation model. In our study, the optimal model achieved a sensitivity of 83.9% (95% CI 66.3–94.6%), ROC AUC of 0.906 (95% CI 0.835–0.978), PR AUC of 0.637 (95% CI 0.528–0.746) and F1 score of 0.79. Adding nursing notes in model development improved our model's general performance, with an increase in the F1 score from 0.699 to 0.788 and an increase in PR AUC from 0.52 to 0.64. Considering the comparison baseline of PR AUC is the incidence of SSI, the magnitude of improvement is substantial.

Our study highlighted that a standard text description structure of nursing notes in EMR could potentially improve the accuracy of SSI detection models. For example, describe the observed evidence of SSIs (e.g., intraoperative cultures, purulent drainage, blood culture test positive, etc.) and conclude that its presence in notes would dramatically improve the possibility of machines in identifying SSIs from the text patterns.

Our findings demonstrate that accurate machine learning models can be developed using administrative and EMR text data. Three sets of models developed from this study can be easily translated into surveillance programs. For example, the set of models could be a tool for an initial screening patient charts to locate the most likely SSIs or exclude the negative cases, saving time and cost to enable large population-based surveillance. The developed models could also be applied to clinical practice to support quality improvement initiatives locally, nationally, or internationally. We believe that the developed models hold the potential to effectively decrease the workload of SSI surveillance, and determining the extent of this reduction represents a valuable direction for future research.

The generalizability of our models to other hospitals is a critical consideration. While the models demonstrated promising results in our specific setting, their applicability to other healthcare facilities may vary. The success of the models largely depends on the availability and quality of data in each hospital's EMR system. Therefore, rigorous validation and customization are strongly recommended before deploying our models in other settings to ensure their accuracy and effectiveness within the unique context of each hospital's healthcare environment.

Finally, while our model has shown promise, there is room for improvement, particularly in terms of precision and reliability. For instance, employing more advanced representations of data, such as language models and embeddings for text data, could be particularly beneficial. Techniques such as transformer-based models like BERT or GPT have shown a remarkable ability to understand the nuanced context within the text and can convert text into high-dimensional vectors, or embeddings, that encapsulate semantic meaning. Utilizing these advanced techniques in our models represents a significant area for future improve our ability to detect SSIs.

Limitations

Our study had several limitations. First, the reported incidence rates of SSIs were calculated using 90 days of follow-up as literature suggests most SSIs tend to occur within the first 3 months following surgery [7, 14, 29]. Different follow-up days may generate discordance in SSI incidence rates. While using restricted follow-up days (e.g., 30 or 60 days) may improve the precision of models, the sensitivity will be compromised. Researchers need to choose the cut-offs according to their research objectives. Second, the imbalanced data may create challenges for machines to capture the text patterns of SSI cases. We employed random over sampling strategies during the model training phase to improve the performance of machine learning classification models for the imbalanced datasets. Third, we only included nursing notes for model development as they contain the most clinical detail of daily patient care and are universally documented in all patient records. Other clinical notes, such as diagnostic reports, surgery-related reports, and discharge summaries, were not included in this study. Incorporating those notes may potentially enhance the sensitivity of the developed models, but it is likely that both the positive predictive value and overall performance will be greatly diminished. Lastly, the performance of models using clinical notes from the EMR database is contingent on the quality of reporting by nurses. Potential human errors, diverse documentation practices, and the adequacy of healthcare professionals' EMR training can influence the accuracy and reliability of the results.

留言 (0)

沒有登入
gif