Evaluating the Quality of Postpartum Hemorrhage Nursing Care Plans Generated by Artificial Intelligence Models

The nursing process enables the collection of data related to current or potential health problems of individuals, families, and communities; the establishment of nursing diagnoses; the provision of nursing care; and the evaluation of care outcomes.1,2 According to the World Health Organization (WHO), the nursing process involves utilizing a scientific problem-solving method to provide the best benefit to the patient.3 The primary objective of nursing diagnoses is to address care needs, requiring a critical perspective to form clinical judgments about human responses to health and illness. Nursing diagnoses enable nurses to organize care and determine care needs and priorities of an individual or a community.1,2 The most widely recognized classification of nursing diagnoses is the North American Nursing Diagnosis Association International (NANDA-I).4 Identifying nursing diagnoses aids in organizing nursing care of individuals, as well as prioritizing nursing care needs for various population groups.5

In our constantly evolving modern age of technology, nurses need to adopt contemporary technologies and innovations to effectively sustain their nursing processes, improve their professional development, and provide high-quality care to patients.6,7 In this context, artificial intelligence (AI), which has made significant recent advancements, may support nursing practices by offering real-time decision support, minimizing time needed for administrative tasks, and streamlining the efficient management of patient data and care.8,9 Furthermore, AI-based support systems can assist in making clinical decisions, thereby contributing to the betterment of evidence-based nursing care.10-12

AI is the simulation of human intelligence in machines, created through human programming or content, designed to execute tasks that typically require human cognitive abilities, including problem-solving, decision-making, and pattern recognition.13,14 In recent years, large language models (LLMs) have been developed by major technology companies, especially for AI, to understand natural languages and generate meaningful new text data. LLMs possess various functional capabilities such as engaging in conversation, performing language translation, responding to follow-up questions, recognizing errors, and debunking baseless theories.15-17 Another crucial facet of AI is robotic process automation, encompassing the utilization of software robots for automating repetitive and routine tasks, as well as physical robots, which are machines programmable to perform tasks in the physical world.16-18 The most popular AI models are OpenAI’s ChatGPT and Google Bard.17-19 There has been an evolving portfolio of research on AI models for applications in the field of medicine and health care.20,21 Among medical AI models (eg, MedQA, MedicalGPT, Med-PaLM), Med-PaLM stands out as the AI model demonstrating the best clinical knowledge performance.21

The potential applications of AI in various nursing domains such as education22,23 and nursing care6,7,24 are currently under intensive research. The health care sector is a transformational industry that requires the evaluation of the best methods for integrating Al.6,7 It is anticipated that nursing professionals can benefit from AI models such as GPT-4 and Med-PaLM to interpret patient data, enhance decision-making, and facilitate communication.24-26 However, generally, there are 2 different opinions on the utilization of AI in nursing care services. One of these opinions is that, due to the ongoing development of AI models, nurses can more easily access and analyze patient data, enabling nurses to navigate the complex and constantly changing health care environment more effectively. This has the potential to enhance the overall quality and effectiveness of patient care.24-27 The other opinion is that, due to nursing care plans relying on a significant amount of sensitive patient data, AI may not ensure patient data privacy and security, potentially giving rise to ethical concerns.27-29

Therefore, considering the rapid advancements in health care technology, it is crucial to comprehensively evaluate the potential contributions and outcomes of AI in nursing, particularly from a clinical perspective. AI models may provide significant benefit in developing nursing care plans; however, it is important to recognize that the application of AI in nursing could also impact clinical practice and education.

Woodnutt et al30 assessed the quality of a mental health nursing care plan generated by ChatGPT based on the authors’ clinical experience and National Institute for Health and Care Excellence (NICE) guidelines. The study concluded that the care plan created by ChatGPT was inaccurate, and its utilization could potentially endanger patients.30 This is an important issue due to the critical nature of care for this patient population; thus, opportunities to evaluate AI more effectively could be beneficial to improve patient safety and quality.

To further evaluate care plans developed by AI, a scenario was devised focusing on postpartum hemorrhage (PPH). There are various obstetric emergencies during pregnancy, including heavy bleeding, preeclampsia, eclampsia, and PPH. In this study, the selection of a nursing care plan for PPH, an obstetric emergency, is of paramount importance as PPH is a leading cause of maternal morbidity and mortality worldwide.31,32 PPH can be prevented with effective and timely intervention and care; however, according to WHO data, PPH is responsible for one-fourth of all maternal deaths worldwide.33 To our knowledge, there is currently no study evaluating care plans created by AI models for the management of PPH. Therefore, this study was conducted to evaluate the quality of obstetrics and gynecological nursing care plans created by AI models for the management of PPH.

METHODS Study design

We used a cross-sectional exploratory evaluation design for this study.34 Three publicly accessible AI models were evaluated: GPT-4 (utilizing ChatGPT),35 LaMDA (utilizing Bard),36 and Med-PaLM (utilizing Google Cloud).37 An expert panel was created to (1) develop a care plan based on an imaginary patient scenario designed by the author and (2) evaluate the care plans generated by the 3 AI models. The expert panel consisted of 3 clinicians (2 employed in the delivery room [E.B., B.K.]; 1 employed in the postpartum ward [E.D.]) and 3 academicians in the field of obstetrics and gynecological nursing (A.K., E.O., S.K.).

Scenario and care plan development

An obstetric emergency scenario for PPH was devised by the author involving an imaginary patient, aged 43 years, multiparous, with a systolic blood pressure of 60 mm Hg at 25 minutes postpartum, and experiencing continuous dark red vaginal bleeding. Based on a collective consensus, the expert panel established a care plan for the imaginary patient using the NANDA-I nursing diagnoses between August 28, 2023, and September 8, 2023. The information related to the imaginary patient was input by the author into ChatGPT (September 12, 2023), Bard (September 13, 2023), and Med-PaLM (September 14, 2023) in a new and single session, without any prior conversation. Immediately after, the AI models were tasked with creating a nursing care plan incorporating NANDA-I nursing diagnoses. The care plans created by the expert panel and AI models are provided in Supplemental Digital Content, Table (available at: https://links.lww.com/JNCQ/B220).

GPT-4

GPT-4 is an advanced AI language model developed by OpenAI on March 14, 2023, utilizing ChatGPT. It offers superior search functions, corrects spelling errors, and provides support in literature reviews. It is an enhanced version of ChatGPT with advanced language comprehension, reasoning, and decision-making capabilities. In addition, to use GPT-4, a paid subscription to the OpenAI account is required.17-19,35

LaMDA

LaMDA, developed by Google on March 21, 2023, is trained on a diverse data set encompassing extensive text and code, including books, articles, and various other content formats, utilizing Bard. It can generate text, perform language translation tasks, create various creative contents, and provide informative responses. However, it cannot perform functions such as creating articles or correcting spelling errors. Bard is free to use.17-19,36

Med-PaLM

Med-PaLM, a medical AI model developed by Google Research and DeepMind on January 6, 2023, using Google Cloud, is notable for providing answers to posed questions by offering access to openly available sources. In its comparison with other medical AI models, this model not only has accurately responded to multiple-choice and open-ended questions but also provides the reasoning behind its answers. Med-PaLM is free to use.21,37

Evaluation

Differences in care plans were meticulously documented by the expert panel, including specific areas of deviation, variations in recommended interventions, and differences in the overall structure and content. The same expert panel compared the care plans generated by AI models with the care plan they had created between September 17, 2023, and September 24, 2023. The Grading of Recommendations Assessment, Development, and Evaluation (GRADE) scale38 was used to assess the quality of responses in each section of the care plan (nursing assessment, nursing diagnoses, nursing interventions, nursing evaluation). GRADE provides a systematic approach to making clinical practice recommendations and is the most widely adopted tool for grading the quality of evidence. Using the GRADE scale, the quality of the care plans was assessed using scores ranging from 0 to 4, with 0 = no response (the system refused to provide any information), 1 = inaccurate response (the system response did not reflect any facts relevant to the corresponding question), 2 = clinically inaccurate response (the system response included facts about the corresponding question but was not clinically relevant), 3 = partially clinically accurate response (the system response was accurate and clinically relevant, yet it introduced some risks in terms of misinterpretations and misunderstanding), and 4 = mostly clinically accurate response (the system response was accurate and clinically relevant, and risk was minimal for misinterpretations and misunderstanding). Phi et al39 reported an internal consistency (Cronbach α) coefficient of 0.80 for the GRADE scale. In this study, the internal consistency coefficient was 0.86.

Analysis

Fleiss’ κ coefficient, a measure of reliability among the interrater, was calculated to assess the consistency of ratings between different raters.40 The Shapiro-Wilk test41 was conducted to assess the normality of evaluator data, and the Levene test42 was performed to evaluate the equality of variances. Subsequently, the Kruskal-Wallis test43 was conducted to compare variations in quality ratings among the 3 groups. Significant differences between group pairs were examined using post hoc Dunn’s test,44 with Bonferroni correction for multiple comparisons. Data were analyzed with SPSSv.26.0 (IBM Corp, Armonk, New York).45 Statistical significance was set at P < .05. As no patient data were used, ethical review was not required.

RESULTS

Overall quality ratings differed between the 3 AI models: Med-Palm, mean = 3.75 (SD = 0.26); GPT-4, mean = 2.75 (SD = 0.19); and LaMDA, mean = 2.0 (SD = 0.14) (Table 1).

Table 1. - Average Quality Ratings of Sections in Postpartum Hemorrhage Nursing Care Plans Created by AI Models Using NANDA-I Nursing Diagnoses, Based on the Clinical Scenario Sections of Care Plan Average Quality Ratingsa Med-PaLM LaMDA GPT-4 Nursing Assessment 4 3 0 Nursing Diagnoses 3 2 4 Nursing Interventions 4 3 4 Nursing Evaluation 4 0 3 Overall mean (SD) 3.75 (0.26) 2 (0.14) 2.75 (0.19) Minimum-Maximum 3-4 0-4 0-4

Abbreviation: AI, artificial intelligence.

aGRADE (Grading of Recommendations Assessment, Development, and Evaluation): 0 = no response (the system refused to provide any information); 1 = inaccurate response (the system response did not reflect any facts relevant to the corresponding question); 2 = clinically inaccurate response (the system response included facts about the corresponding question but was not clinically relevant); 3 = partially clinically accurate response (the system response was accurate and clinically relevant, yet it introduced some risks in terms of misinterpretations and misunderstanding); 4 = mostly clinically accurate response (the system response was accurate and clinically relevant, and risk was minimal for misinterpretations and misunderstanding).

Significant differences were observed in the quality rating score distribution among the 3 models (χ22 = 18.1; P = .001). Med-PaLM exhibited superior quality, characterized by greater clinical accuracy per GRADE ratings, compared with LaMDA (Z = 4.354; P = .0001) and GPT-4 (Z = 3.126; P = .029). In addition, a significant difference in the quality of care plans was observed between LaMDA and GPT-4, with GPT-4 having higher ratings (Z = 4.208; P = .001) (Table 2).

Table 2. - Comparison of AI Models in Terms of Differences in Quality Assessments AI Models Dunn’s Test, Z Value P Value Med-PaLM vs LaMDA 4.354 .0001 Med-PaLM vs GPT-4 3.126 .029 LaMDA vs GPT-4 4.208 .001

Abbreviation: AI, artificial intelligence.

The expert panel showed perfect agreement for Med-PaLM (κ = 0.97; 95% CI, 0.85-1.17) and near-perfect agreement for LaMDA and GPT-4 (κ = 0.86; 95% CI, 0.65-1.06). The data were found to be non-normally distributed (P < .05) for each section of the care plans.

DISCUSSION

This is the first study to evaluate the quality of obstetrics and gynecological nursing care plans created by 3 AI models in the management of PPH. Our study reveals noteworthy variations in the quality of care plans generated by 3 prominent AI models, finding that Med-PaLM exceled in quality and clinical accuracy. In addition, a significant difference was observed in the distribution of scores among sections of the care plan. These results suggest that Med-PaLM provided higher-quality responses based on compatibility with the nursing care plan created by the expert panel utilizing NANDA-I nursing diagnoses.

The nursing assessment and nursing interventions sections of the care plan created by LaMDA demonstrated partial clinical accuracy. However, the nursing diagnoses section provided a medical diagnosis (PPH) instead of a nursing diagnosis, and the nursing evaluation section was left unanswered. These responses received the lowest-quality score in our assessment.

The nursing diagnoses and nursing interventions sections of the nursing care plan created by GPT-4 demonstrated clinical accuracy. The nursing evaluation section was partially clinically accurate. However, the overall score was affected by the unanswered nursing assessment section. The study conducted by Woodnutt et al,30 which evaluated a mental health nursing care plan generated by ChatGPT using the authors’ clinical experiences and NICE guidelines, concluded that the produced care plan was erroneous, and its utilization could potentially jeopardize patient care. This is an important concern when using AI models in health care, as they are only as accurate as who develops the program and content that supports the AI model. The more data sets specific to health care that are input will likely help these programs to evolve; however, they may be inaccurate at present, with potential for harmful errors.

To ensure a comprehensive approach, future studies must measure acceptability and appropriateness of AI models among stakeholders, including nurses, midwives, researchers, and developers. Moreover, studies suggest that the use of many AI models such as ChatGPT in nursing care services may pose ethical risks26,28-30; this issue can potentially be addressed through the anonymization of patient data.46,47

Currently, AI models may not be able to provide a comprehensive and adequate nursing care plan, including nursing diagnoses and interventions. AI may be a useful tool in the future as the technology evolves and further evidence is created. Consideration for future research should include the financial return-on-investment aspect once the technology is optimized. In addition, medical AI models should be used to support care plan development, not to replace nurses.

While there are no specific AI models designed to create nursing care plans, the implementation of robust data security measures, transparent and unbiased algorithms, and clear accountability can significantly enhance the development of nursing care plans. In addition, as AI models evolve and are designed with consideration for these limitations, they have the potential to revolutionize the preparation of nursing care plans in the future, improve patient care outcomes, and ensure the delivery of the best possible care to patients.

Limitations

Despite promising results, it is crucial to acknowledge certain limitations. Our study’s focus on a specific scenario and the absence of real patient data may constrain the generalizability of the findings.

CONCLUSIONS

This study is a pioneering effort to evaluate nursing care plans created by AI models in the management of PPH. Notably, Med-PaLM emerged as a promising AI model, showcasing superior quality and clinical accuracy. These findings pave the way for future research, underscoring the potential of AI to enhance health care practices and ultimately contribute to improved patient outcomes.

REFERENCES 1. Urden LD, Stacy KM, Lough ME. Priorities in Critical Care Nursing. 7th ed. Elsevier Health Sciences Publications; 2015. 2. Chang YY, Chao LF, Xiao X, Chien NH. Effects of a simulation-based nursing process educational program: a mixed-methods study. Nurse Educ Pract. 2021;56:103188. doi:10.1016/j.nepr.2021.103188 3. World Health Organization. Roles and responsibilities of government chief nursing and midwifery officers: a capacity-building manual. Published 2022. Accessed September 10, 2023. http://www.who.int/hrh/nursing_midwifery/15178_gcnmo.pdf?ua=1 4. NANDA International. The structure and development of syndrome diagnoses. Published 2021. Accessed September 6, 2023. https://nanda.org/publications-resources/resources/position-statement 5. Herdman HT, Kamitsuru S, Takáo Lopes C. NANDA International Nursing Diagnoses: Definitions and Classification, 2021-2023. 12th ed. Thieme; 2021. 6. Seibert K, Domhoff D, Bruch D, et al. Application scenarios for artificial intelligence in nursing care: rapid review. J Med Internet Res. 2021;23(11):e26522. doi:10.2196/26522 7. Martinez Ortigosa A, Martinez Granados A, Gil Hernández E, Rodriguez Arrastia M, Ropero Padilla C, Roman P. Applications of artificial intelligence in nursing care: a systematic review. J Nurs Manag. 2023;2023:12. doi:10.1155/2023/3219127 8. Hwang GJ, Chang PY, Tseng WY, Chou CA, Wu CH, Tu YF. Research trends in artificial intelligence associated nursing activities based on a review of academic studies published from 2001 to 2020. Comput Inform Nurs. 2022;40(12):814-824. doi:10.1097/CIN.0000000000000897 9. O’Connor S, Yan Y, Thilo FJS, Felzmann H, Dowding D, Lee JJ. Artificial intelligence in nursing and midwifery: a systematic review. J Clin Nurs. 2023;32(13/14):2951-2968. doi:10.1111/jocn.16478 10. Clancy TR. Artificial intelligence and nursing: the future is now. J Nurs Adm. 2020;50(3):125-127. doi:10.1097/NNA.0000000000000855 11. Lee D, Yoon SN. Application of artificial intelligence-based technologies in the healthcare industry: opportunities and challenges. Int J Environ Res Public Health. 2021;18(1):271. doi:10.3390/ijerph18010271 12. Jiang F, Jiang Y, Zhi H, et al. Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol. 2017;2(4):230-243. doi:10.1136/svn-2017-000101 13. Zhou B, Yang G, Shi Z, Ma S. Natural language processing for smart healthcare. IEEE Rev Biomed Eng. 2024;17:4-18. doi:10.1109/RBME.2022.3210270 14. Floridi L, Chiriatti M. GPT-3: its nature, scope, limits, and consequences. Minds Mach. 2020;30:681-694. doi:10.1007/s11023-020-09548-1 15. OpenAI. Conversational AI model for medical inquiries. ChatGPT. Published 2023. Accessed September 10, 2023. https://www.openai.com/chatgpt 16. Peters V, Baumgartner M, Froese S, et al. Risk and potential of ChatGPT in scientific publishing. J Inherit Metab Dis. 2023;46(6):1005-1006. doi:10.1002/jimd.12666 17. Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29(8):1930-1940. doi:10.1038/s41591-023-02448-8 18. Koski E, Murphy J. AI in healthcare. Stud Health Technol Inform. 2021;284:295-299. doi:10.3233/SHTI210726 19. McGrow K. Artificial intelligence: essentials for nursing. Nursing. 2019;49(9):46-49. doi:10.1097/01.NURSE.0000577716.57052.8d 20. Ben-Abacha A, Agichtein E, Pinter Y, Demner-Fushman D. Overview of the medical question answering task at TREC 2017 LiveQA. Published 2017. Accessed September 11, 2023. https://trec.nist.gov/pubs/trec26/papers/Overview-QA.pdf?ref=https://githubhelp.com 21. Singhal K, Azizi S, Tu T, et al. Large language models encode clinical knowledge. Nature. 2023;620(7972):172-180. doi:10.1038/s41586-023-06291-2 22. Abdulai AF, Hung L. Will ChatGPT undermine ethical values in nursing education, research, and practice. Nurs Inq. 2023;30(3):e12556. doi:10.1111/nin.12556 23. Veldhuis LI, Woittiez NJC, Nanayakkara PWB, Ludikhuize J. Artificial intelligence for the prediction of inhospital clinical deterioration: a systematic review. Crit Care Explor. 2022;4(9):e0744. doi:10.1097/CCE.0000000000000744 24. Nashwan AJ, Abujaber AA. Harnessing large language models in nursing care planning: opportunities, challenges, and ethical considerations. Cureus. 2023;15(6):e40542. doi:10.7759/cureus.40542 25. Shang Z. A concept analysis on the use of artificial ıntelligence in nursing. Cureus. 2021;13(5):e14857. doi:10.7759/cureus.14857 26. Liu J, Wang C, Liu S. Utility of ChatGPT in clinical practice. J Med Internet Res. 2023;25:e48568. doi:10.2196/48568 27. von Gerich H, Moen H, Block LJ, et al. Artificial intelligence-based technologies in nursing: a scoping literature review of the evidence. Int J Nurs Stud. 2022;127:104153. doi:10.1016/j.ijnurstu.2021.104153 28. Stokes F, Palmer A. Artificial intelligence and robotics in nursing: ethics of caring as a guide to dividing tasks between AI and humans. Nurs Philos. 2020;21(4):e12306. doi:10.1111/nup.12306 29. Sallam M. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare (Basel). 2023;11(6):887. doi:10.3390/healthcare11060887 30. Woodnutt S, Allen C, Snowden J, et al. Could artificial intelligence write mental health nursing care plans? J Psychiatr Ment Health Nurs. 2024;31(1):79-86. doi:10.1111/jpm.12965 31. Assis IdC, Govêia CS, Miranda DB, Ferreira RS, Riccio LG. Analysis of the efficacy of prophylactic tranexamic acid in preventing postpartum bleeding: systematic review with meta-analysis of randomized clinical trials. Braz J Anesthesiol. 2023;73(4):467-476. doi:10.1016/j.bjane.2022.08.002 32. World Health Organization. Maternal mortality. Published 2023. Accessed December 20, 2023. https://www.who.int/news-room/fact-sheets/detail/maternal-mortality 33. World Health Organization. Updated WHO recommendation on tranexamic acid for the treatment of postpartum haemorrhage: highlights and key messages from the World Health Organization’s 2017 Global Recommendation. Published 2017. Accessed September 10, 2023. https://www.who.int/publications/i/item/WHO-RHR-17.21 34. Setia MS. Methodology series module 3: cross-sectional studies. Indian J Dermatol. 2016;61(3):261-264. doi:10.4103/0019-5154.182410 35. OpenAI. GPT-4. Published 2023. Accessed September 12, 2023. https://openai.com/product/gpt-4 36. Collins E, Ghahramani Z. LaMDA: our breakthrough conversation technology. The Keyword. May 18, 2021. Accessed September 13, 2023. https://blog.google/technology/ai/lamda 38. Guyatt GH, Oxman AD, Kunz R, et al. GRADE Working Group. What is “quality of evidence” and why is it important to clinicians? BMJ. 2008;336(7651):995-998. doi:10.1136/bmj.39490.551019.BE 39. Phi L, Ajaj R, Ramchandani MH, et al. Expanding the Grading of Recommendations Assessment, Development, and Evaluation (Ex-GRADE) for evidence-based clinical recommendations: validation study. Open Dent J. 2012;6:31-40. doi:10.2174/1874210601206010031 40. Fleiss JL. Measuring nominal scale agreement among many raters. Psychol Bull. 1971;76(5):378-382. doi:10.1037/h0031619 41. Shapiro SS, Wilk MB. An analysis of variance test for normality (complete samples). Biometrika. 1965;52(3):591-611. doi:10.2307/2333709 42. Levene H. Robust Tests for Equality of Variances. Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling. 2th ed. Stanford University Press; 1960:278-292. 43. Kruskal WH, Wallis WA. Use of ranks in one-criterion variance analysis. J Am Stat Assoc. 1952;47(260):583-621. doi:10.1080/01621459.1952.10483441 44. Dunn OJ. Multiple comparisons using rank sums. Technometrics. 1964;6(3):241-252. doi:10.1080/00401706.1964.10490181 45. IBM Corp. IBM SPSS Statistics for Windows. Version 26.0. IBM Corp; 2019. 46. Jin H, Luo Y, Li P, Mathew J. A review of secure and privacy-preserving medical data sharing. IEEE Access. 2019;7:61656-61669. doi:10.1109/ACCESS.2019.2916503 47. Vovk O, Piho G, Ross P. Methods and tools for healthcare data anonymization: a literature review. Int J General Syst. 2023;52(3):326-342. doi:10.1080/03081079.2023.2173749

留言 (0)

沒有登入
gif