Comparing Scoring Consistency of Large Language Models with Faculty for Formative Assessments in Medical Education

Liaison Committee on Medical Education. Standards, Publications and Notification Forms. Available at https://lcme.org/publications/. Accessed on 24 Feb 2024.

Papanagnou D, Corliss S, Richards JB, Artino AR Jr, Schwartzstein R. Progression of self-directed learning in health professions education: Clarifying terms and processes. Acad Med. 2024;99(2):236. https://doi.org/10.1097/ACM.0000000000005191.

Article PubMed Google Scholar

Van Wijk EV, Janse RJ, Ruijter BN, et al. Use of very short answer questions compared to multiple choice questions in undergraduate medical students: An external validation study. PLoS One. 2023;18(7): e0288558. https://doi.org/10.1371/journal.pone.0288558.

Article CAS PubMed PubMed Central Google Scholar

Magliano JP, Graesser AC. Computer-based assessment of student-constructed responses. Behav Res Methods. 2012;44(3):608-621. https://doi.org/10.3758/s13428-012-0211-3.

Article PubMed Google Scholar

Hauer KE, Boscardin C, Brenner JM, van Schaik SM, Papp KK. Twelve tips for assessing medical knowledge with open-ended questions: Designing constructed response examinations in medical education. Med Teach. 2019;42(8):880-885. https://doi.org/10.1080/0142159x.2019.1629404.

Article PubMed Google Scholar

González-Calatayud V, Prendes-Espinosa P, Roig-Vila R. Artificial intelligence for student assessment: A systematic review. Appl Sci. 2021;11(12):5467. https://doi.org/10.3390/app11125467.

Article CAS Google Scholar

Chen YK, Wrenn JO, Xu H, et al. Automated Assessment of Medical Students’ Clinical Exposures according to AAMC Geriatric Competencies. PubMed. 2014; 2014:375-384.

Google Scholar

Spickard A, Ridinger H, Wrenn J, et al. Automatic scoring of medical students’ clinical notes to monitor learning in the workplace. Med Teach. 2013;36(1):68-72. https://doi.org/10.3109/0142159x.2013.849801.

Article PubMed Google Scholar

Mirchi N, Bissonnette V, Yilmaz R, Ledwos N, Winkler-Schwartz A, Del Maestro RF. The Virtual Operative Assistant: An explainable artificial intelligence tool for simulation-based training in surgery and medicine. Pławiak P, ed. PLOS ONE. 2020;15(2): e0229596. https://doi.org/10.1371/journal.pone.0229596.

Article CAS PubMed PubMed Central Google Scholar

Saplacan D, Herstad J, Pajalic Z. Feedback from digital systems used in higher education: An inquiry into triggered emotions two universal design-oriented solutions for a better user experience. In Transforming Our World through Design, Diversity and Education: Proceedings of Universal Design and Higher Education in Transformation Congress 2018; In: Proceedings of Universal Design and Higher Education in Transformation Congress 2018. Vol 256. pp. 421–430. IOS Press.

Shanahan M. Talking About Large Language Models. arXiv (Cornell University). Published online December 7, 2022. https://doi.org/10.48550/arxiv.2212.03551.

Gardner J, O’Leary M, Yuan L. Artificial intelligence in educational assessment: “Breakthrough? Or buncombe and ballyhoo?” J Comput Assist Learn. 2021;37(5):1207-1216. https://doi.org/10.1111/jcal.12577.

Article Google Scholar

Nur M, Arief Ramadhan, Hendric L. Automatic essay exam scoring system: a systematic literature review. Procedia Comput Sci. 2023; 216:531-538. https://doi.org/10.1016/j.procs.2022.12.166.

Article Google Scholar

Hussein MA, Hassan H, Nassef M. Automated language essay scoring systems: a literature review. PeerJ Comput Sci. 2019;5: e208. https://doi.org/10.7717/peerj-cs.208.

Article PubMed PubMed Central Google Scholar

Altmäe S, Sola-Leyva A, Salumets A. Artificial intelligence in scientific writing: a friend or a foe? Reproductive Biomedicine Online. Published online April 1, 2023. https://doi.org/10.1016/j.rbmo.2023.04.009.

Open AI. ChatGPT: optimizing language models for dialogue. Open AI. Published November 30, 2022. https://openai.com/blog/chatgpt/. Accessed 1 Feb 2023.

Martineau K. What is generative AI? IBM Research Blog. Published February 9, 2021. https://research.ibm.com/blog/what-is-generative-AI.

Kasneci E, Sessler K, Küchemann S, et al. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences. Sci Direct. 2023;103(102274). https://doi.org/10.1016/j.lindif.2023.102274.

Mackenzie SC, Sainsbury CAR, Wake DJ. Diabetes and artificial intelligence beyond the closed loop: a review of the landscape, promise and challenges. Diabetologia. 2024;67(2):223-235. https://doi.org/10.1007/s00125-023-06038-8.

Article PubMed Google Scholar

Sanmarchi F, Bucci AF, Nuzzolese AG, et al. A step-by-step researcher’s guide to the use of an AI-based transformer in epidemiology: an exploratory analysis of ChatGPT using the STROBE checklist for observational studies. J Public Health. Published online May 26, 2023. https://doi.org/10.1007/s10389-023-01936-y.

Grabb D. ChatGPT in Medical Education: A Paradigm Shift or a Dangerous Tool? Acad Psychiatr. 2023;47(4):439-440. https://doi.org/10.1007/s40596-023-01791-9.

Article Google Scholar

Lee H. The Rise of ChatGPT: Exploring its potential in medical education. Anat Sci Educ. Published online March 14, 2023. https://doi.org/10.1002/ase.2270.

Mohammad B, Turjana Supti, Mahmood Alzubaidi, et al. The pros and cons of using ChatGPT in medical education: A scoping review. Published online June 29, 2023. https://doi.org/10.3233/shti230580.

Denny JC, Spickard A, Speltz PJ, Porier R, Rosenstiel DE, Powers JS. Using natural language processing to provide personalized learning opportunities from trainee clinical notes. J Biomed Inform. 2015; 56:292-299. https://doi.org/10.1016/j.jbi.2015.06.004.

Article PubMed Google Scholar

Yudkowsky R, Yoon-Soo Park, Downing SM. Assessment in Health Professions Education. Routledge, New York, NY; 2020.

Google Scholar

Seguin A, Haynes RB, Carballo S, Iorio A, Perrier A, Agoritsas T. Translating clinical questions by physicians into searchable queries: Analytical survey study. JMIR Med Educ. 2020;6(1): e16777. https://doi.org/10.2196/16777.

Article PubMed PubMed Central Google Scholar

Core EPA Publications and Presentations. AAMC. Available at https://www.aamc.org/what-we-do/mission-areas/medical-education/cbme/core-epas/publications. Accessed 24 Feb 2024.

Park YS, Hyderi A, Bordage G, Xing K, Yudkowsky R. Inter-rater reliability, and generalizability of patient note scores using a scoring rubric based on the USMLE Step-2 CS format. Adv Health Sci Educ Theory Pract. 2016;21(4):761-73. https://doi.org/10.1007/s10459-015-9664-3.

Article PubMed Google Scholar

Prompting AI chatbots. Available at: https://cte.ku.edu/prompting-ai-chatbots. Accessed 10 November 2023.

Yudkowsky R, Hyderi A, Holden J, et al. Can nonclinician raters be trained to assess clinical reasoning in postencounter patient notes? Acad Med. 2019;94: S21-S27. https://doi.org/10.1097/acm.0000000000002904.

Article PubMed Google Scholar

StataCorp. 2023. Stata Statistical Software: Release 18. College Station, TX: StataCorp LLC.

Google Scholar

Temsah O, Khan SA, Yazan Chaiah, et al. Overview of early ChatGPT’s presence in medical literature: Insights from a hybrid literature review by ChatGPT and Human Experts. Cureus. Published online April 8, 2023. https://doi.org/10.7759/cureus.37281.

A “Fundamental Theorem” of Biomedical Informatics CHARLES P. FRIEDMAN, P HD).

Brenner J, Fulton TB, Marieke Kruidering, et al. What have we learned about constructed response short-answer questions from students and faculty? A multi-institutional study. Med Teach. Published online September 9, 2023:1–10. https://doi.org/10.1080/0142159x.2023.2249209.

McNamara DS, Crossley SA, Roscoe RD, Allen LK, Dai J. A hierarchical classification approach to automated essay scoring. Assess Writ. 2015; 23:35-59. https://doi.org/10.1016/j.asw.2014.09.002.

Article Google Scholar

Shermis MD, Burstein JC. Automated Essay Scoring. Routledge; 2003, 71–86.

Book Google Scholar

McNamara DS, Crossley SA, McCarthy PM. The linguistic features of quality writing. Writ Commun. 2010a;27:57–86

Article Google Scholar

Sallam M. ChatGPT utility in healthcare education, research, and practice: Systematic review on the promising perspectives and valid concerns. Healthcare. 2023;11(6):887. https://doi.org/10.3390/healthcare11060887.

Article PubMed PubMed Central Google Scholar

Yudkowsky R, Hyderi A, Holden J, et al. Can nonclinician raters be trained to assess clinical reasoning in postencounter patient notes? Acad Med. 2019;94: S21-S27. https://doi.org/10.1097/acm.0000000000002904.

Article PubMed Google Scholar

Mearian L. How to train your chatbot through prompt engineering. Computerworld. Published March 21, 2023. https://www.computerworld.com/article/3691253/how-to-train-your-chatbot-through-prompt-engineering.html. Accessed 24 Feb 2024

How ChatGPT Can Help with Grading. Available at https://blog.tcea.org/chatgpt-grading/. Accessed 24 Feb 2024

Atlas S. Chatbot Prompting: A guide for students, educators, and an AI-augmented workforce. Stephen Atlas (Independently published). 2023.

Ramesh D, Sanampudi SK. An automated essay scoring system: a systematic literature review. Artif Intell Rev. Published online September 23, 2021. https://doi.org/10.1007/s10462-021-10068-2.

Somoye FL. Is Chat GPT free? In short - yes. PC Guide. Published February 24, 2023. https://www.pcguide.com/apps/chat-gpt-free/. Accessed 24 Feb 2024

View original article

JOURNAL OF GENERAL INTERNAL MEDICINE

Like

分享书签

0 0 0 0 0 0 0

More from this channel

Comparing Scoring Consistency of Large Language Models with Faculty for Formative Assessments in Medical Education

留言 (0)