Evaluating the Performance of Artificial Intelligence in Generating Differential Diagnoses for Infectious Diseases Cases: A Comparative Study of Large Language Models

Abstract

Background: Artificial Intelligence (AI) has potential to transform healthcare including the field of infectious diseases diagnostics. This study assesses the capability of three large language models (LLMs), GPT 4, Llama 3, and Gemini 1.5 to generate differential diagnoses, comparing their outputs against those of medical experts to evaluate AI's potential in augmenting clinical decision-making. Methods: This study evaluates the differential diagnosis capabilities of three LLMs, GPT 4, Llama 3, and Gemini 1.5, using 50 simulated infectious disease cases. The cases were diverse, complex, and reflective of common clinical scenarios, including detailed histories, symptoms, lab results, and imaging findings. Each model received standardized case information and produced differential diagnoses, which were then compared to reference differential diagnosis lists created by medical experts. The analysis utilized the Jaccard index and Kendall's Tau to assess similarity and order accuracy, summarizing findings with mean, standard deviation, and combined p-values. Results: The mean numbers of differential diagnoses generated by GPT 4, Llama 3, and Gemini 1.5 were 6.22, 5.06, and 10.02 respectively which was significantly different (p<0.001) from the medical experts. The mean Jaccard index of GPT 4, Llama 3, and Gemini 1.5 were 0.3, 0.21, and 0.24 while the mean Kendall's Tau were 0.4, 0.7, and 0.33 respectively. The combined p-value of GPT 4, Llama 3, and Gemini 1.5 were 1, 1, 0.979 respectively indicating no significant association between the differential diagnosis generated by the LLMs and the medical experts. Conclusion: Although LLMs like GPT 4, Llama 3, and Gemini 1.5 exhibit varying effectiveness, none align significantly with expert-level diagnostic accuracy, emphasizing the need for further development and refinement. The findings highlight the importance of rigorous validation, ethical considerations, and seamless integration into clinical workflows to ensure AI tools enhance healthcare delivery and patient outcomes effectively.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study did not receive any funding

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

All data produced in the present study are available upon reasonable request to the authors

留言 (0)

沒有登入
gif