Comparing the dental knowledge of large language models

Introduction With the advancement of artificial intelligence, large language models (LLMs) have emerged as technology that can generate human-like text across various domains. They hold vast potential in the dental field, able to be integrated into clinical dentistry, administrative dentistry, and for student and patient education. However, the successful integration of LLMs into dentistry is reliant on the dental knowledge of the models used, as inaccuracies can lead to significant risks in patient care and education.

Aims We are the first to compare different LLMs on their dental knowledge through testing the accuracy of different model responses to Integrated National Board Dental Examination (INBDE) questions.

Methods We include closed-source and open-source models and analysed responses to both ‘patient box' style board questions and more traditional, textual-based, multiple-choice questions.

Results For the entire INBDE question bank, ChatGPT-4 had the highest dental knowledge, with an accuracy of 75.88%, followed by Claude-2.1 with 66.38% and then Mistral-Medium at 54.77%. There was a statistically significant difference in performance across all models.

Conclusion Our results highlight the high potential of LLM integration into the dental field, the importance of which LLM is chosen when developing new technologies, and the limitations that must be overcome before unsupervised clinical integration can be adopted.

留言 (0)

沒有登入
gif