Large Language Models (LLMs) such as ChatGPT and Gemini are gaining momentum in healthcare for their diagnostic potential. However, their real-world applicability in specialized medical fields like neurology remains inadequately explored. The possibility to use these tools in everyday diagnostic practice relies on the evaluation of their ability to serve as support for the clinician in assessing the patient, understanding the possible diagnosis and design the diagnostic pathway. To this end, in this study we (1) examined the available literature on the evaluation of LLMs in neurology diagnosis in order to understand whether the methodologies applied were adequate to translate the use of LLMs in everyday practice, and (2) designed and performed an experiment to evaluate the diagnostic accuracy and clinical recommendations of ChatGPT–3.5 and Gemini compared to neurologists using real–world clinical cases presented following the everyday diagnostic practice. In the vast literature of LLMs application in neurology, only 24 studies reported experiences using LLMs in clinical neurology. The experiments reported showed a heterogeneous scenario of prompt engineering and input formats. At present, while responses using structured prompts were well documented, there is a lack of studies using real–world clinical scenarios, and everyday workflows and practice. We therefore conducted a real-world experiment using a cohort of 28 anonymized patient records from the neurology department of the ASST Santi Paolo e Carlo Hospital (Milan, Italy). Cases were presented to ChatGPT–3.5 and Gemini replicating the typical clinical workflows. Diagnostic accuracy and appropriateness of recommended diagnostic tests were assessed against discharge diagnoses and neurologists' performance. Neurologists achieved a diagnostic accuracy of 75%, outperforming ChatGPT–3.5 (54%) and Gemini (46%). Both LLMs exhibited difficulties in nuanced clinical reasoning and over-prescribed diagnostic tests in 17–25% of cases. Despite their ability to generate structured recommendations, they struggled with complex or ambiguous presentations, requiring additional prompts in some cases. We can therefore conclude that LLMs have potential as supportive tools in neurology but they currently lack the depth required for nuanced clinical decision–making. The findings emphasize the need for further refinement of LLMs and the development of evaluation methodologies that reflect the complexities of real-world neurology practice.
Competing Interest StatementThe authors have declared no competing interest.
Funding StatementThis study did not receive any funding
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The present study qualifies as a non-interventional observational study and does not constitute a clinical trial. The study has been approved by the IRB of the University of Milan with the approval number 123/24, All. 3 CE 10/12/24. Patients provided consent for the processing of their personal data at the time of hospital admission under protocol ast_daz_502_ed00, in accordance with privacy regulations (D.Lgs. 101/2018, implementing EU GDPR 2016/679). This consent permits research on clinical data. The data have been anonymized prior to processing, ensuring that they cannot be traced back to any specific patient.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data AvailabilityAll data produced in the present work are contained in the manuscript
留言 (0)