Comparing Scoring Consistency of Large Language Models with Faculty for Formative Assessments in Medical Education

Liaison Committee on Medical Education. Standards, Publications and Notification Forms. Available at Accessed on 24 Feb 2024.

Papanagnou D, Corliss S, Richards JB, Artino AR Jr, Schwartzstein R. Progression of self-directed learning in health professions education: Clarifying terms and processes. Acad Med. 2024;99(2):236.

Article  PubMed  Google Scholar 

Van Wijk EV, Janse RJ, Ruijter BN, et al. Use of very short answer questions compared to multiple choice questions in undergraduate medical students: An external validation study. PLoS One. 2023;18(7): e0288558.

Article  CAS  PubMed  PubMed Central  Google Scholar 

Magliano JP, Graesser AC. Computer-based assessment of student-constructed responses. Behav Res Methods. 2012;44(3):608-621.

Article  PubMed  Google Scholar 

Hauer KE, Boscardin C, Brenner JM, van Schaik SM, Papp KK. Twelve tips for assessing medical knowledge with open-ended questions: Designing constructed response examinations in medical education. Med Teach. 2019;42(8):880-885.

Article  PubMed  Google Scholar 

González-Calatayud V, Prendes-Espinosa P, Roig-Vila R. Artificial intelligence for student assessment: A systematic review. Appl Sci. 2021;11(12):5467.

Article  CAS  Google Scholar 

Chen YK, Wrenn JO, Xu H, et al. Automated Assessment of Medical Students’ Clinical Exposures according to AAMC Geriatric Competencies. PubMed. 2014; 2014:375-384.

Google Scholar 

Spickard A, Ridinger H, Wrenn J, et al. Automatic scoring of medical students’ clinical notes to monitor learning in the workplace. Med Teach. 2013;36(1):68-72.

Article  PubMed  Google Scholar 

Mirchi N, Bissonnette V, Yilmaz R, Ledwos N, Winkler-Schwartz A, Del Maestro RF. The Virtual Operative Assistant: An explainable artificial intelligence tool for simulation-based training in surgery and medicine. Pławiak P, ed. PLOS ONE. 2020;15(2): e0229596.

Article  CAS  PubMed  PubMed Central  Google Scholar 

Saplacan D, Herstad J, Pajalic Z. Feedback from digital systems used in higher education: An inquiry into triggered emotions two universal design-oriented solutions for a better user experience. In Transforming Our World through Design, Diversity and Education: Proceedings of Universal Design and Higher Education in Transformation Congress 2018; In: Proceedings of Universal Design and Higher Education in Transformation Congress 2018. Vol 256. pp. 421–430. IOS Press.

Shanahan M. Talking About Large Language Models. arXiv (Cornell University). Published online December 7, 2022.

Gardner J, O’Leary M, Yuan L. Artificial intelligence in educational assessment: “Breakthrough? Or buncombe and ballyhoo?” J Comput Assist Learn. 2021;37(5):1207-1216.

Article  Google Scholar 

Nur M, Arief Ramadhan, Hendric L. Automatic essay exam scoring system: a systematic literature review. Procedia Comput Sci. 2023; 216:531-538.

Article  Google Scholar 

Hussein MA, Hassan H, Nassef M. Automated language essay scoring systems: a literature review. PeerJ Comput Sci. 2019;5: e208.

Article  PubMed  PubMed Central  Google Scholar 

Altmäe S, Sola-Leyva A, Salumets A. Artificial intelligence in scientific writing: a friend or a foe? Reproductive Biomedicine Online. Published online April 1, 2023.

Open AI. ChatGPT: optimizing language models for dialogue. Open AI. Published November 30, 2022. Accessed 1 Feb 2023.

Martineau K. What is generative AI? IBM Research Blog. Published February 9, 2021.

Kasneci E, Sessler K, Küchemann S, et al. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences. Sci Direct. 2023;103(102274).

Mackenzie SC, Sainsbury CAR, Wake DJ. Diabetes and artificial intelligence beyond the closed loop: a review of the landscape, promise and challenges. Diabetologia. 2024;67(2):223-235.

Article  PubMed  Google Scholar 

Sanmarchi F, Bucci AF, Nuzzolese AG, et al. A step-by-step researcher’s guide to the use of an AI-based transformer in epidemiology: an exploratory analysis of ChatGPT using the STROBE checklist for observational studies. J Public Health. Published online May 26, 2023.

Grabb D. ChatGPT in Medical Education: A Paradigm Shift or a Dangerous Tool? Acad Psychiatr. 2023;47(4):439-440.

Article  Google Scholar 

Lee H. The Rise of ChatGPT: Exploring its potential in medical education. Anat Sci Educ. Published online March 14, 2023.

Mohammad B, Turjana Supti, Mahmood Alzubaidi, et al. The pros and cons of using ChatGPT in medical education: A scoping review. Published online June 29, 2023.

Denny JC, Spickard A, Speltz PJ, Porier R, Rosenstiel DE, Powers JS. Using natural language processing to provide personalized learning opportunities from trainee clinical notes. J Biomed Inform. 2015; 56:292-299.

Article  PubMed  Google Scholar 

Yudkowsky R, Yoon-Soo Park, Downing SM. Assessment in Health Professions Education. Routledge, New York, NY; 2020.

Google Scholar 

Seguin A, Haynes RB, Carballo S, Iorio A, Perrier A, Agoritsas T. Translating clinical questions by physicians into searchable queries: Analytical survey study. JMIR Med Educ. 2020;6(1): e16777.

Article  PubMed  PubMed Central  Google Scholar 

Core EPA Publications and Presentations. AAMC. Available at Accessed 24 Feb 2024.

Park YS, Hyderi A, Bordage G, Xing K, Yudkowsky R. Inter-rater reliability, and generalizability of patient note scores using a scoring rubric based on the USMLE Step-2 CS format. Adv Health Sci Educ Theory Pract. 2016;21(4):761-73.

Article  PubMed  Google Scholar 

Prompting AI chatbots. Available at: Accessed 10 November 2023.

Yudkowsky R, Hyderi A, Holden J, et al. Can nonclinician raters be trained to assess clinical reasoning in postencounter patient notes? Acad Med. 2019;94: S21-S27.

Article  PubMed  Google Scholar 

StataCorp. 2023. Stata Statistical Software: Release 18. College Station, TX: StataCorp LLC.

Google Scholar 

Temsah O, Khan SA, Yazan Chaiah, et al. Overview of early ChatGPT’s presence in medical literature: Insights from a hybrid literature review by ChatGPT and Human Experts. Cureus. Published online April 8, 2023.

A “Fundamental Theorem” of Biomedical Informatics CHARLES P. FRIEDMAN, P HD).

Brenner J, Fulton TB, Marieke Kruidering, et al. What have we learned about constructed response short-answer questions from students and faculty? A multi-institutional study. Med Teach. Published online September 9, 2023:1–10.

McNamara DS, Crossley SA, Roscoe RD, Allen LK, Dai J. A hierarchical classification approach to automated essay scoring. Assess Writ. 2015; 23:35-59.

Article  Google Scholar 

Shermis MD, Burstein JC. Automated Essay Scoring. Routledge; 2003, 71–86.

Book  Google Scholar 

McNamara DS, Crossley SA, McCarthy PM. The linguistic features of quality writing. Writ Commun. 2010a;27:57–86

Article  Google Scholar 

Sallam M. ChatGPT utility in healthcare education, research, and practice: Systematic review on the promising perspectives and valid concerns. Healthcare. 2023;11(6):887.

Article  PubMed  PubMed Central  Google Scholar 

Yudkowsky R, Hyderi A, Holden J, et al. Can nonclinician raters be trained to assess clinical reasoning in postencounter patient notes? Acad Med. 2019;94: S21-S27.

Article  PubMed  Google Scholar 

Mearian L. How to train your chatbot through prompt engineering. Computerworld. Published March 21, 2023. Accessed 24 Feb 2024

How ChatGPT Can Help with Grading. Available at Accessed 24 Feb 2024

Atlas S. Chatbot Prompting: A guide for students, educators, and an AI-augmented workforce. Stephen Atlas (Independently published). 2023.

Ramesh D, Sanampudi SK. An automated essay scoring system: a systematic literature review. Artif Intell Rev. Published online September 23, 2021.

Somoye FL. Is Chat GPT free? In short - yes. PC Guide. Published February 24, 2023. Accessed 24 Feb 2024

留言 (0)
