Authors’ Reply: “Evaluating GPT-4’s Cognitive Functions Through the Bloom Taxonomy: Insights and Clarifications”

We appreciate the thoughtful commentary titled “Evaluating GPT-4’s Cognitive Functions Through the Bloom Taxonomy: Insights and Clarifications” [] and welcome the opportunity to clarify and expand upon our research findings [] regarding GPT-4’s cognitive evaluation using the Bloom taxonomy.

First, we acknowledge the confusion surrounding the use of the term “difficulty” in our manuscript. Traditionally in educational testing, “difficulty” is quantified by the ratio of correct responses against the number of students taking the test []; thus, a rating of 1 indicates an extremely simple question (100% correct responses), and a rating of 0 indicates a significantly challenging question (0% correct responses). Throughout the manuscript, we used “difficulty” as a measurement scale.

Consequently, “higher difficulty” means it is higher on the scale and thus easier. This also applies to . Because “lower” means less easy (ie, closer to 0 on the scale from 0 to 1), it shows that the questions answered correctly were easier compared to those answered wrong. Although our use of the measurement “difficulty” is correct, on reflection, we agree that we could have been clearer, and we apologize for any confusion.

Second, the commentary on GPT-4’s approach to “memory” tasks adds a valuable dimension to our discussion. We agree that GPT-4 “remembers” through technical and programmatic means, highlighting the critical difference between GPT-4’s architecture and human cognitive processes, a distinction that was central to our study.

However, GPT-4’s material selection is far more complex than a flat-file database with simple mapping (unless the exam questions had been in the testing data, but this is not applicable in our case). Generative tools like GPT-4 have other weaknesses and strengths. For example, they may perform relatively poorly on pure memory-recall problems but excel in topics requiring subtlety and nuanced work. This is demonstrated by GPT-4’s high performance on soft-skill questions from the USMLE (United States Medical Licensing Examination) and AMBOSS []. Part of our study went further by using the Bloom taxonomy as a framework for tracing the logical process of GPT-4’s explanations (not answers) and determining the stages at which its errors occurred.

This discussion underscores a critical point: the complexity of assessing artificial intelligence and the processes underlying the output of models like GPT-4. This methodology allows us to critically examine where GPT-4’s responses fall within a spectrum of cognitive tasks, from simple recall to more complex analytical and evaluative processes.

Third, while it is quite true that many questions in medical qualifying exams are simple memory-type questions, we see this as a weakness rather than an optimum aiming point. While our understanding is that medical schools are trying to move away from those types of questions, this is an area of further research.

Again, we thank the author for the thoughtful critique of our paper and the resultant continued discussion, which underscores the importance of ongoing dialogue and research into artificial intelligence’s cognitive processes and how they parallel and diverge from human cognition.

None declared.

Edited by T Leung; This is a non–peer-reviewed article. submitted 26.02.24; accepted 04.04.24; published 16.04.24.

©Anne Herrmann-Werner, Teresa Festl-Wietek, Friederike Holderried, Lea Herschbach, Jan Griewatz, Ken Masters, Stephan Zipfel, Moritz Mahling. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 16.04.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

View original article

JOURNAL OF MEDICAL INTERNET RESEARCH

Like

分享书签

0 0 0 0 0 0 0

More from this channel

Authors’ Reply: “Evaluating GPT-4’s Cognitive Functions Through the Bloom Taxonomy: Insights and Clarifications”

留言 (0)