Assessment of ChatGPT’s adherence to ETA-thyroid nodule management guideline over two different time intervals 14 days apart: in binary and multiple-choice queries

Objective

Artificial intelligence (AI) has significant potential in healthcare, particularly in providing decision-support in specialized domains like thyroid nodule management. This study assesses the effectiveness of ChatGPT-v4, an advanced AI model, in aligning with the European Thyroid Association (ETA) - 2023 guidelines.

Methods

The study utilized a structured questionnaire comprising 100 questions, divided into true/false and multiple-choice formats, reflecting real-world clinical scenarios in thyroid nodule management. These questions encompassed diagnostic criteria, treatment options, follow-up protocols, and patient counseling. ChatGPT response was evaluated for accuracy, consistency, and comprehensiveness using a six-point Likert scale. The assessment occurred initially and was repeated after 14 days.

Results

In the binary queries, the AI model showed an ability to correct some initially incorrect responses. However, there was a noticeable regression in certain responses. 8 of the 11 previously non-compliant responses remained unchanged, while 3 non-compliant responses were rectified. Conversely, 6 initially compliant answers transitioned to non-compliance after 14 days. In multiple-choice queries, the AI’s performance was more consistent. A majority of the responses, 43 (86% of the total), were initially correct and maintained their correctness upon re-assessment. However, 4 responses that were initially incorrect remained unchanged, and 3 correct responses shifted to non-compliance over time.

Conclusion

ChatGPT exhibited improving potential as a clinical support tool in thyroid nodule management altgouh it showed varied performance for binary and multiple-choice questions.

Clinical trial registration

N/A

留言 (0)

沒有登入
gif