Hematopathology is a crucial field in medical diagnosis, primarily focusing on the microscopic examination of blood samples. This process involves the identification and classification of blood cells, including red blood cells, white blood cells, and platelets. The characteristics of these cells are vital for accurately diagnosing various blood-related conditions such as leukemia, anemia, infections, and coagulation disorders. By observing the morphology, size, quantity, and distribution of cells, doctors can obtain critical health information that is significant for determining the nature, progression, and treatment strategies for diseases. Traditionally, hematopathology analysis has relied on professionals visually examining blood samples under a microscope to identify and classify cells based on their appearance. This process requires extensive expertise, experience, and a high level of attention to detail, as identifying many diseases depends on accurately recognizing subtle cellular changes. However, this approach may be limited by subjectivity and potential errors in manual operations, particularly in high-volume or complex cases.
With the continuous advancement of artificial intelligence (AI) technology, generative AI has begun to demonstrate a unique value in medical diagnosis. ChatGPT-4, the latest generation of large language models developed by OpenAI, represents a significant advancement in the generative pretrained transformer (GPT) series. ChatGPT-4 marks a significant breakthrough in natural language processing (NLP) technology in the field of AI. Its key features encompass advanced language understanding and generation capabilities, multilingual support, deep text comprehension, and widespread applications in various domains, such as text summarization, language translation, educational assistance, and content creation.1 Furthermore, it demonstrates improved accuracy and reliability, fosters more natural and human-like interactions, and exhibits proficiency in recognizing and interpreting complex medical images, including hematological morphology images.
Generative AI, which is based on deep-learning technology, can generate, interpret, and analyze large amounts of data. In hematological morphology, AI systems can learn to recognize different cell types, thereby assisting or accelerating the diagnostic process of diseases. ChatGPT-4, with its advanced AI model, can process and analyze data extracted from images, provide rapid analytical results, reduce human error, and handle large datasets. Furthermore, AI models can identify subtle abnormalities that traditional methods may overlook in certain situations.
Currently, no existing research explores the application of ChatGPT-4 in laboratory microscopic examinations. Therefore, the primary purpose of this study was to evaluate the performance of ChatGPT-4 in hematology recognition, primarily to determine whether these advanced techniques can provide an accuracy comparable to or higher than that of traditional manual methods. By comparison, we explored whether ChatGPT can be used in the future to improve the diagnostic process of hematology, thereby enhancing the overall diagnostic efficiency and whether AI assistance can reduce diagnostic errors caused by human errors, subjective judgments, or other factors. Additionally, we aimed to understand whether ChatGPT can provide more consistent and standardized hematological assessments, particularly in resource-limited rural areas. Furthermore, to advance medical education and training, we assessed the application of ChatGPT in medical education and professional training, particularly in improving the skills of medical students and professionals in hematology identification, thereby guiding future clinical practice and teaching.
2. METHODS 2.1. BackgroundChatGPT-4, developed by OpenAI, is an autoregressive language model that was released on March 14, 2023. This demonstrates the significant potential for processing and analyzing large volumes of complex data. It boasts exact text analysis and generation capabilities, making it a powerful auxiliary tool in medical image analysis.2 Our research aims to explore the application of ChatGPT-4 in hematological morphological recognition, particularly in identifying and classifying various cell types in blood samples. By utilizing ChatGPT-4 to automatically recognize blood morphologies, we hope to enhance the efficiency and accuracy of the diagnosis. The outcomes of this research offer valuable insights for future studies and clinical applications of hematological morphology.
2.2. Data sourceThe data were sourced from peripheral blood smears of the American Society of Hematology (ASH) Image Bank,3 a comprehensive public collection of high-quality peer-reviewed hematological images. These images were presented in digital and case-based formats for reference or instruction, including pictures and case studies, to ensure the quality of the data. Our sample consisted of 38 JPEG images depicting 44 features. These images include three types of cells: red blood cells (RBCs), white blood cells (WBCs), and platelets. The cells include normal blood cells, immature cells, polymorphic cells related to anemia, and inclusion bodies. Table 1 provides information regarding the types and numbers of blood cells.
Table 1 - Types of blood cell images WBC n RBC n Platelet n Inclusion body n Neutrophil 2 Red cell 1 Platelet 2 Auer body 1 Basophil 2 Target cell 2 Giant platelet 2 Howell-Jolly body 1 Eosinophil 2 Burr cell 1 Platelet clumping 1 Cabot ring 1 Monocyte 2 Spur cell 1 Döhle bodies 1 Band neutrophil 2 Tear-drop cell 1 Lymphocyte 1 Sickle cell 1 Metamyelocyte 1 NRBC 1 Myeloblast 1 Sperocyte (HS) 1 Promonocyte 2 Pyropoikilocytosis 1 Myelocyte 1 Acanthocytes 1 Promyelocyte 1 Reactive lymphocyte 1 Hypersegmented neutrophil 1 Large granule lymphocyte 2 Pelger-Huet anomaly 1 May-Hegglin anomaly 1 Reed-Sternberg cell 1ASH = American Society of Hematology; RBC = red blood cell; WBC = white blood cell.
This study used blood cell images from the ASH image database and identified them using traditional manual methods with ChatGPT-4 assistance. Finally, we compared the results of these two methods with the ASH standard classification to evaluate their performance.
2.4. Study analysisThe accuracy of the manual recognition and ChatGPT-4 assisted recognition was analyzed using Microsoft Excel (Microsoft Corporation, Redmond, WA, USA). We used the chi-square test for goodness of fit to assess whether there is a significant difference between human and GPT recognition capabilities.
3. RESULTSThe result’ of the study showed that of the 11 questions about normal WBCs, ChatGPT-4 answered 7 correctly (64%). In contrast, manual identification answered all questions correctly (100%), with a p value of 0.002, indicating that manual identification was more effective. Among the 13 questions on identifying abnormal WBC, ChatGPT-4 and manual identification accuracies were 38% and 46%, respectively, with a p value of 0.58, indicating that the difference between the two was not statistically significant. Normal RBCs were identified correctly by both groups (100%). Among the 10 questions about abnormal RBCs, ChatGPT-4 answered 7 correctly (70%), whereas manual identification answered 6 correctly (60%), with a p value of 0.52, indicating no significant difference in performance. Regarding platelets, the performances of ChatGPT-4 and manual identification did not show significant differences between normal and abnormal platelets, although manual identification was slightly better than identifying abnormal platelets. Regarding inclusion bodies, ChatGPT-4 correctly identified three (75%), whereas manual identification correctly identified only one (25%). The p value was 0.02, indicating that ChatGPT-4 performed significantly better than manual identification.
Among the 44 questions, the overall correctness rate for ChatGPT-4 was 69%, whereas the correctness rate for manual identification was 71%. The overall p value was 0.34, indicating that there was no apparent performance difference between ChatGPT-4 and manual identification across all questions. These results show that, in most categories, the performance of ChatGPT-4 is comparable to that of human identification. However, specific categories (such as normal WBC count and inclusion bodies) showed significant differences in performance. Table 2 compares the accuracy rates of the traditional manual and ChatGPT-4 assisted recognition to present these results. Fig. 1 visually displays the accuracy rates of both methods, whereas Fig. 2 shows the correct answers of the manual and ChatGPT-4 compared to the expectation (ASH).
Table 2 - Comparison of GPT-4 and manual recognition accuracy Category No. of questions ChatGPT-4 Manual p WBC (%) Normal 11 7 (64) 11 (100) 0.002 Abnormal 13 5 (38) 6 (46) 0.58 RBC (%) Normal 1 1 (100) 1 (100) — Abnormal 10 7 (70) 6 (60) 0.52 Platelet (%) Normal 2 2 (100) 2 (100) — Abnormal 3 1 (33) 2 (67) 0.22 Inclusion bodies (%) 4 3 (75) 1 (25) 0.02 Total (%) 44 26 (69) 29 (71) 0.34Comparison of accuracy between manual and ChatGPT-4. PLT = platelet; RBC = red blood cell; WBC = white blood cell.
Fig. 2:Number of correct answers of manual and ChatGPT-4 comparison to expectation (ASH). ASH = American Society of Hematology; PLT = platelet; RBC = red blood cell; WBC = white blood cells.
Furthermore, the correct identification rate for rare cells, such as Pelger-Huët anomaly, May-Hegglin anomaly, and Reed-Sternberg cells, is 17.8% for manual identification and 66.7% for ChatGPT-4 (p value = 0.026), underscoring its importance in clinical diagnosis. Fig. 3 illustrates the correct results for ChatGPT-4 and the manual recognition of various types of blood cells.
Fig. 3:Number of correct answers in ChatGPT-4 and manual analysis. PLT = platelet; RBC = red blood cell; WBC = white blood cell.
4. DISCUSSIONThe application of ChatGPT-4 AI extends beyond assisting in the identification of blood morphology. It can also provide important information regarding the cause of a disease, signs indicating potential conditions, and potential harm to humans. This information is helpful to medical professionals for further diagnosis. This is important when investigating and providing healthcare plans. Table 3 shows the detailed annotation of blood cells using ChatGPT-4. Currently, the ability of the generative AI ChatGPT-4 to identify blood cells is slightly lower than that of professional technicians, with average correct answer rates of 69% (ChatGPT-4) and 71% (manual). ChatGPT-4 provides additional information for the identification of abnormal rare blood cells and inclusion bodies. If the question is asked in different ways, such as “These cells have blue points inside,” “This case is a congenital autosomal dominant anomaly,” “This blood smear image displays a variety of RBC morphologies,” and “This cell is seen in Classical Hodgkin lymphoma”, ChatGPT-4 consistently provides accurate answers, indicating an excellent level of performance.
Table 3:Blood cell annotations of ChatGPT-4 (as an example)
GPT-4’s accuracy in identifying normal blood cells was slightly lower than that of traditional manual identification; however, it performed better in identifying abnormal blood cells and rare inclusions. This shows that, although GPT-4 has advantages, it still requires improvement. The performance of GPT-4 was limited by its training data. Its training data included only information up to April 2023, which is currently not available in academic papers that use GPT-4 to assist in blood cell recognition. Although the GPT-4 performs well in natural language tasks, its functions in image analysis and medical image recognition require additional data sources and training. This study aims to inspire further research to expand and diversify the training dataset to provide more types of blood cell images, particularly those with rare or abnormal conditions, to improve the recognition ability of GPT-4. In short, although GPT-4 shows potential in blood cell recognition, reaching expert-level accuracy requires additional training and optimization.
Findings from other studies in the literature support and expand our observations. Handa et al’s4 application of ChatGPT for processing various biomedical images, including capsule endoscopy and magnetic resonance imaging (MRI), underscores the significant potential of AI in medical image analysis. Similarly, Wu et al’s5 exploration of GPT-4V across multiple human body systems and clinical imaging modes highlights the proficiency of AI in differentiating imaging modes and anatomy yet also points out the challenges in disease diagnosis and report generation. The use of deep convolutional neural networks (CNN) by Shekar et al6 in detecting malaria pathogens with high accuracy further exemplifies the strides AI has made in specific diagnostic areas. Additionally, a study by Xing et al,7 which showcases the benefits of AI-assisted classification in leukocyte identification, particularly for junior technicians, and the reduction in classification time, strongly indicates the potential of AI in enhancing diagnostic efficiency and accuracy.
In our study, hematology samples were initially used in the public image database (ASH) to test the applicability of GPT-4.0 in blood morphology identification. The research results show that GPT-4’s performance in this field exhibits apparent fluctuations, which indicates that its performance is affected mainly by the training data. Although the GPT-4 performs well in processing language tasks, it requires more data and in-depth training in the fields of image analysis and medical image recognition. In addition, when testing actual cases, we encountered some technical challenges, such as resolution limitations of the camera system and variability in the manual staining process, which need to be overcome in future research efforts.
REFERENCES 1. OpenAI. GPT-4. 2023. Available at https://openai.com/product/gpt-4. Accessed December 5, 2023. 2. Wikipedia. GPT-4. 2023. Available at https://zh.wikipedia.org/zh-tw/GPT-4. Accessed December 5, 2023. 3. ASH Image bank. Available at https://imagebank.hematology.org/. Accessed December 8, 2023. 4. Handa P, Chhabra D, Goel N, Krishnan S. Exploring the role of ChatGPT in medical image analysis. Biomed Signal Proc Control. 2023;86:105292. 5. Wu C, Lei J, Zheng Q, Zhao W, Lin W, Zhang X, et al. Can GPT-4V(ision) serve medical applications? Case studies on GPT-4V for multimodal medical diagnosis. arXiv :2310.09909. 2023. 6. Shekar G, Revathy S, Goud EK. Malaria detection using deep learning. In: 2020 4th International Conference on Trends in Electronics and Informatics. Tirunelveli, India: Department of Information Technology, Sathyabama Institute of Science and Technology; 2020, 746–50. 7. Xing Y, Liu X, Dai J, Ge X, Wang Q, Hu Z, et al. Artificial intelligence of digital morphology analyzers improves the efficiency of manual leukocyte differentiation of peripheral blood. BMC Med Inform Decis Mak. 2023;23:50.
留言 (0)