Artificial intelligence (AI) seeks to develop innovative machines that are capable of responding in ways similar to human intelligence. Its scope encompasses robotics, language recognition, image recognition, natural language processing, and expert systems. In recent years, the integration of AI into medical education (AIME) has significantly transformed traditional teaching and learning methodologies [,]. It can enhance clinical reasoning training, facilitate adaptive learning, construct innovative medical teaching platforms, and facilitate critical care simulation teaching. For instance, the AI-powered simulation mannequins called SimMan 3G PLUS allows medical students to practice their clinical skills and teamwork in a fully controlled and immersive environment. Augmented reality and virtual reality technologies enable students to gain valuable hands-on experience by recreating real-life situations, without compromising patient safety []. In other cases, AI helps tailor educational content based on individual students’ needs and abilities, and it not only fosters academic success but also cultivates critical thinking [,]. AI applications have become an important tool and situation for both educators and students.
As AI continues to evolve and advance, its role in promoting medical education is expected to grow. Despite many publications that have explored AI’s application in medical research, they are limited to specific clinical questions in cardiology, dentistry, oncology, etc [,]. No bibliometric analysis has ever been conducted to assess medical education, which is essential for medical talent reserve. Therefore, a comprehensive analysis of current scientific literature is imperative for the continued advancement of AIME.
Bibliometrics uses mathematical and statistical methodologies to evaluate scholarly productivity and publication patterns within a specific discipline. We used bibliometrics to analyze the contemporary academic landscape, research priorities, and emerging trends in AIME in the 21st century. We revealed the knowledge clusters, scientific social patterns, and evolutionary nuances from an objective standpoint. This work not only simplifies navigation for researchers across disciplines but also gives valuable guidance to newcomers in the field, directing them toward promising avenues for future research.
The research adhered to the step-by-step guidelines of bibliometric analysis and followed the reporting guideline of bibliometric reviews in biomedical literature [].
Data CollectionThe Web of Science Core Collection database was selected as the primary tool to identify relevant publications for this study. This database is well-known for its exceptional bibliometric research capabilities across over 190 subject areas and offers manual-checked literature retrieval. A 2-step process was used to construct an effective retrieval strategy. First, we extracted search terms from relevant systematic reviews and meta-analyses. Second, these search terms were meticulously reviewed and refined by authors (TW and RL) to ensure adequate coverage of all relevant research topics. The final search strategy was formulated based on this collaborative assessment: TS=(“artificial intelligence” or AI or “convolutional neural network” or “recurrent neural network” or “long short-term neural memory” or “machine learning” or “genetic algorithm” or “evolutionary algorithm” or “artificial neural network” or “support vector machine” or “fuzzy logic” or “random forest” or “deep learning” or “natural language processing” or “speech recognition” or “computer vision” or “smart robot” or “video recognition” or “image recognition”) and TS=(medical education or medicine education). Data collection took place on October 1, 2024, with publication years restricted from 2000 to 2024. No language restrictions were applied during the search process. Full records of retrieved papers were exported in plain text format.
Selection of Eligible StudiesSince search engines may yield results that do not fully align with the intended criteria, it is necessary to perform a manual screening process to identify the final included literature. To address this, two authors (TW and RL) independently screened the titles and abstracts of available studies based on the following exclusion criteria: (1) publications unrelated to AI. For example, some topics are associated with artificial insemination. (2) Publications unrelated to medical education, such as engineering curriculum and patient education. Notably, a majority of these studies have applied AI technologies to clinical questions and research, and only a small number pertained to medical education. Any discrepancies between the studies selected by the authors were resolved through a consensus meeting to reach a binding decision.
Data Cleaning and PreprocessingThe records from the Web of Science Core Collection database undergo rigorous selection criteria adhere to meticulous curation standards, and it partly enhances our research quality []. To further ensure accuracy and consistency, we performed a thorough data cleaning process. This involved the elimination of duplicate papers, which were identified through digital objective identifiers and study titles. The biblioshiny package in RStudio was used to standardize the names of authors and institutions. Any variations in author names were consolidated. As an author usually affiliates with institutions at different times, only the institution of the author at the time of publication is retained. Synonyms, aliases, and singular or plural forms, such as “AR,” “augmented reality,” and “augmented realities” were cleaned using a thesaurus file. The data cleaning was conducted manually by two authors (TW and RL).
Data Analysis and VisualizationTo present the knowledge structure and emerging research trends, VOSviewer (version 1.6.18; Leiden University), Incites (Clarivate), InteractiVenn (Universidade de São Paulo), and Citespace (Drexel University) were used. Incites is a research evaluation tool developed by Clarivate Analytics, and the scores of citation impact, H-index, journal impact factor, international collaborations, study influence, and immediacy index were directly obtained in Incites. The citation impact is calculated by dividing the citation impact by the number of years since publication. The H-index indicates that H papers published by an author have been cited at least H times, thereby serving as a gauge of both scholarly productivity and influence. The journal impact factor was used based on the 2023 Journal Citation Reports. International collaborations were assessed to reveal the extent of interdisciplinary cooperation among coauthors hailing from various nations, highlighting the global reach and collaborative nature of the research topic. The study’s influence measures the mean influence of a study within the first 5 years of post publication, providing a view of its long-term impact. Concurrently, the immediacy index represents the average frequency with which a study is cited in the year of its publication, offering insight into the immediate reception in the academic community. The VOSviewer software was used to cluster countries, institutions, journals, and keywords. The Citespace software was used to identify keyword clusters and citation burst time. The specific parameter settings for analysis in VOSviewer and Citespace are provided in and . The latent semantic indexing and log-likelihood ratio algorithms were used for literature clustering. The InteractiVenn tool was used to identify the specific and overlapping journals in 4 categories: count, citation, top 10% papers, and cooperative work. To compare the frequencies of keywords in the AI and medical education fields, the keywords were classified based on their basic definition, and lists of professional terms or vocabulary or terminology.
Table 1. Information for clustering using the VOSviewer.ItemTypeAlgorithmNormalization methodMinimum document or occurrenceaCountryCoauthorshipMajorizationFull counting12OrganizationCoauthorshipMajorizationFull counting15JournalCitationMajorization—b12KeywordCo-occurrenceMajorizationFull counting33aWe set no limit on the minimum citation.
bNot applicable.
Table 2. Information for keyword cluster using the Citespace.SettingValueTime slicingFrom January 2000 to October 2024, one year per sliceText processingTitle, abstract, author keywords, and keywords plusNode typeKeywordsLinksStrength: cosinel Scope: within slicesSelection criteriak=7Ethical ConsiderationsThe institutional review board of Tongji Hospital deemed ethical approval unnecessary for this study.
This study initially retrieved a total of 7534 publications from the search. provides a detailed flowchart illustrating the publication retrieval and selection process. After title and abstract screening, 2775 publications remained for further bibliometric analysis. The academic publications underwent a relatively flat growth in publication numbers from 2000 to 2017 (A). However, the number of annual papers exhibited exponential growth since 2018, indicating the flourished development in this field. The citation counts also demonstrated a consistent rise, surpassing 10,000 citations in 2024 (B). These documents encompassed original studies (n=1769, 63.75%), proceeding papers (n=467, 16.83%), reviews (n=237, 8.54%), and meeting abstracts (n=67, 2.41%). Proceeding papers typically contain the latest research findings and methods, as well as techniques within a scientific field, and their high percentage indicates that AIME is receiving much attention and developing rapidly. The research areas are primarily focused on Education Scientific Disciplines (n=415), Medicine General Internal (n=334), Health Care Sciences Services (n=323), Engineering Biomedical (n=264), and Surgery (n=255). It suggested that AIME is an interdisciplinary topic that needs support from specialized educators, clinical doctors, hospital administrators, and engineers.
In terms of national performance, 117 countries participated in the global discourse. The majority of research publications originated from Europe, North America, and Asia. The United States emerged as the primary contributor due to its high productivity (n=851 publications) and recognition levels (n=11,598 citations), exhibiting both the highest international and domestic collaborations. Although Chinese researchers published more papers than scholars from the United Kingdom (274 vs 227), their citations were relatively lower (3302 vs 3823). It may possibly be due to the limited innovation, restricted dissemination, and language barriers. The national cluster map revealed a diverse regional distribution pattern facilitated by these collaborative relationships (). There was a strong collaboration among numerous countries, particularly close ties between the United States, China, the United Kingdom, and India.
Altogether, 2119 institutions have participated in AIME studies. The top 10 institutions publishing the most papers are presented in . Among these, 4 institutions have published over 50 papers. Specifically, Harvard University emerged as the most prolific institution (n=70), and established the most international collaboration (n=35), while the University of London had the highest citation impact (n=27). The collaborative connections showed six clusters of countries publishing more than 40 papers (). Stanford University, Johns Hopkins University, National University of Singapore, Mayo Clinic, University of Arizona, and University of Toronto were representative institutions in their respective cluster. Interestingly, regarding the impact relative to the world, the Alibaba Group had the highest score (value=61.43), followed by SUNY Downstate Health Sciences University (value=32.17), and Bukovinian State Medical University (value=25.14).
Table 3. Top 10 productive institutions of artificial intelligence in the medical education field.InstitutionDocumentCitationCitation ImpactInternational collaborationH-IndexImpact relative to worldHarvard University70108515.5035171.25University of California System6086114.3517161.16University of London52121123.2927171.88Johns Hopkins University51100319.6721171.59Stanford University4883417.3822161.40Harvard Medical School4758812.5125131.01Mayo Clinic444019.1112100.74National University of Singapore4343310.0724120.81University of Toronto3962416.0014141.29University of Michigan393087.901990.64Cureus (n=78), JMIR Medical Education (n=72), Medical Teacher (n=54), and BMC Medical Education (n=54) ranked as the top 4 most productive journals among the 1217 involved journals (A and ). JMIR Medical Education published the most high-impact papers (n=48), and the second one, Cureus, only published 26. It is reasonable as JMIR Medical Education lies in the Q1 quartile, and likely receives more high-quality submissions, whereas Cureus is in the Q3 quartile. Cureus, JMIR Medical Education, Medical Teacher, and International Journal of Computer Assisted Radiology and Surgery were overlapped using a Venn diagram (B). It was worth highlighting that Cureus and JMIR Medical Education had just begun to publish AIME-associated papers in 2020 (C). Their open access characteristic helped them receive much attention in recent years. The publishers were diverse among the top 10 publication sources. More than half of these journals (7/10) published top 10% papers over 10. Notably, a significant proportion of the journals (n=1460, 52.61%) offered open access publishing, with gold access accounting for 33.26% (n=923) of this share.
aNot applicable.
Keywords of Research Hot SpotsThe study identified 7773 keywords from the titles and abstracts of the research materials, reflecting the central themes, areas of interest, and potential future developments within the discipline. The heat map highlighted high-frequency keywords, including performance (n=139), education (n=123), artificial intelligence (n=118), and model (n=103; A). We next classified keywords according to the AI and medical education associated categories. The model (n=103), system (n=82), virtual reality (n=51), recognition (n=29), and algorithm (n=27) were popular topics, while most education focus was put on surgery (n=100), skill (n=90), simulation (n=75), classification (n=68), and diagnosis (n=47). The subsequent analysis produced eight distinct keyword clusters using log-likelihood ratio and latent semantic indexing algorithms (B). Both algorithms identified #0 surgical training, #3 medical education, #6 medical student, and #7 imaging processing as hub keywords. It suggested the critical role of AI applications in surgical operations and clinical image information. Citespace burst detection could reflect the research trends and innovative directions (C). Notably, the citation burst time of terms revealed that AI technologies shifted from imaging processing (2000), augmented reality (2013), and virtual reality (2016) to decision-making (2020) and model (2021). Keywords such as mortality and robotic surgery persisted into 2023, suggesting the ongoing recognition and interest in these areas. Digital health involves the use of health apps, wearable equipment, and communication tools, which have been incorporated with AI. These devices not only provide personalized educational experiences and support mental health but also play a significant role in clinical teaching, disease prevention, and health promotion.
AIME publication exhibits exponential growth these years, from approximately 50 counts in 2017 to 600 in 2023. This surge is further bolstered by the development of generative AI tools, and related national-level policies support. Here, we discuss the applications of AIME according to the keywords.
Machine LearningMachine learning (ML), a subset of AI, is a technology that autonomously discerns inner patterns and relationships without explicit human instructions. For instance, after analyzing a large collection of cat and dog images, ML can identify distinguishing features, and subsequently differentiate between a cat and a dog in new photographs. The growing prominence of ML in medical education is evidenced by the surge in related topics, the approval of ML-based products, and the proliferation of entrepreneurial initiatives. ML-centric technologies have already been applied in clinical decision support systems, teaching tools, simulation, and training. An ML-based clinical decision support system could effectively integrate general information about HIV patients, including demographics, medical history, CD4+ lymphocyte count, viral load, genotypic data, and treatment history, to recommend an optimal combination antiretroviral therapy []. More personalized advice on the appropriate dosage or duration of treatment will be provided with the help of other advanced ML algorithms []. In this context, it is vital to educate the next generation of medical professionals with adequate ML knowledge, enabling them to incorporate the outputs of ML tools into clinical decision-making, becoming part of this emerging data science revolution []. In the future, inexperienced medical students will use evidence-based learning models, like IBM’s Watson Oncology system, as ordinary tools to interpret clinical data, make informed decisions, and recommend cancer treatments, in highly accordance with multidisciplinary teams [].
Medical students require educational feedback to understand their performance and identify areas for improvement []. Traditional educational evaluation after surgical training falls short of providing timely, adequate, and objective assessment. A human rater is usually required to observe the video review to give a written or oral evaluation. The feedback is subjective and time-consuming, as well as limited to visual observation. With the help of automated ML-based AI assessments, the surgeon’s performance can be objectively captured in a less resource-intensive way than human grading []. The motion, force, and vibration of robotic instruments are recorded according to the recognized structural metrics, like the Global Evaluative Assessment of Robotic Skill or the Objective Structured Assessment of Technical Skill [].
Deep LearningDeep learning (DL), a subset of ML, is often used to tackle complex tasks such as visual recognition, speech recognition, and natural language processing. This is achieved through the use of advanced architectures like convolutional neural networks, deep belief networks, and stacked auto-encoder networks. DL has been applied across medical undergraduate education, postgraduation education, and continuing education. A notable example is its use during medical retina rotations, where residents may not be fully trained due to limited time or access to complete their learning objectives. To address this, a DL-based model was developed to analyze a vast collection of retina images sourced from three public datasets, subsequently creating a comprehensive dataset for residents. These images were then distributed to each resident to aid in diagnosis, differential diagnosis, and therapeutic planning. The allocation system is tailored to each resident’s case history, academic level, and performance, ensuring that those struggling with specific cases receive additional exposure to similar retinal conditions []. This approach enables AI models to identify the residents who will derive the most benefit from particular clinical cases, thereby significantly enhancing individualized ophthalmology education.
Additionally, DL technologies have been used to predict the difficulty of medical licensing examination questions, promoting more accurate assessments of examination difficulty. However, although DL models can effectively differentiate between cats and dogs, they do so by analyzing potentially hundreds or thousands of variables. The complexity and opacity of these variables often render them incomprehensible to humans. Consequently, there is an urgent need for improved interpretability methods in future DL applications to enhance understanding and transparency.
Natural Language ProcessingNatural language processing plays a crucial role in smart health because of its ability to analyze and comprehend human language. The ChatGPT, developed by OpenAI Corporation, serves as a monumental tool for the natural language processing application. As for AIME, GPT-4 has been used for teaching cases, student analysis, creative writing, personalized learning guidance, and psychological support. ChatGPT is demonstrated to be effective in generating surgical procedure summaries, and its performance cannot be distinguished from a board-certified surgeon by less experienced residents []. In other cases, ChatGPT is used to teach the skills of breaking bad news [], reasoning-based multiple-choice questions [], and qualified examinations, including the Situational Judgement Test for final-year medical students in the United Kingdom []. These capabilities have revolutionized medical education and contributed to the overall improvement of health care delivery.
SegmentationImage segmentation is a crucial AI technology in the radiology and pathology fields to accurately identify and delineate regions of interest, such as tumor lesions, ischemic tissues, and subcellular structures []. Traditional manual segmentation is not only time-consuming for physicians to learn and practice but also results in measurement variability that heavily depends on the observer’s experience. In contrast, computer-assisted segmentation methods reduce the subjectivity and variability inherent in manual approaches, decrease processing time, and require minimal training. The use of image segmentation algorithms has led to significant improvements in the sensitivity and efficiency of detecting pulmonary nodules. Furthermore, these algorithms have been shown to enhance learning interest, bolster self-directed learning capabilities, sharpen problem-solving skills, and foster innovative thinking among medical trainees [].
Effective surgical education for young surgeons presents significant challenges in practice. A segmentation-based system can identify key anatomical structures such as arteries, lymph nodes, and nerves during rectal cancer surgeries. Studies have shown the positive educational impact of AI-assisted videos in surgical training []. Some other AI technologies can facilitate surgical navigation and detect adverse events []. Taken together, real-time object segmentation is expected to play a major role in surgical education.
Comparison to Prior WorkAI has been increasingly used to enhance diagnostic accuracy, improve patient care, and facilitate the development of new treatments across various medical fields. In cardiology, for example, AI-powered algorithms are used to analyze electrocardiograms and detect arrhythmias, while in dentistry, AI-assisted diagnostic tools have been developed to identify oral pathologies [,]. Similarly, in oncology, AI-driven platforms are used to predict tumor growth and response to treatment []. Rather than concentrating on specific clinical questions, this study adopts a broader perspective to investigate the role of AI in medical education as a whole. High-quality medical education is the key to improving the level of medical care. Through our bibliometric analysis, research trends and hot topics in this field were identified, offering insights into how AI is transforming medical education globally.
Future DirectionAIME has witnessed a remarkable evolution from the initial enthusiasm phase to the current acceptance situation []. Although the potential of AI technologies is expected to revolutionize medical education, related ethical considerations and challenges should be carefully examined. One major concern is whether AI might diminish the competence of medical students by increasing their reliance on external tools. Additionally, there is apprehension about the possibility of AI completely replacing medical educators. Issues such as bias, hallucinations, and uncertainties have further contributed to hesitancy in the acceptance and practical application of AI in the medical field []. It was necessary for a balanced approach to ensure sustainable implementation and find practical ways to incorporate AI into curricula. As deeper investigation is conducted, AI will be an integral part of medical education, highlighting a journey of personal and professional growth alongside technological adoption.
Since the release of ChatGPT, AI-generated content (AIGC) has emerged as an innovative educational tool with significant potential. AIGC technologies have the capability to reshape pedagogical practices through various applications, such as virtual patient construction, automated question bank generation, and 3D anatomical simulations [,]. These technologies may also help address the uneven distribution of educational resources and facilitate the updating of outdated knowledge. Exploring the application of AIGC in medical education represents both a transformation and expansion of existing teaching models, as well as a forward-looking exploration of future cultivation patterns.
AIME introduces new demands for educational accreditation systems and training models. While traditional medical education accreditation emphasizes faculty strength, physical facilities, and curriculum design, AI enriches these standards by incorporating digital classes, digital resources, and dynamic monitoring. In the context of postgraduate education, AIME necessitates a re-evaluation of training duration, content, and assessment methods. It is crucial for policy makers and practitioners to explore how AI can be reasonably integrated into residents’ learning, ensuring a balance that prevents over-reliance on technology and maintains the importance of clinical practical experience.
LimitationsThis study has several limitations. First, the bibliometric analysis was conducted exclusively using the Web of Science Core Collection database. This singular focus may introduce bias into the results, as it does not account for data from other significant sources. Future research should incorporate additional databases, such as Scopus and Google Scholar, to provide a more comprehensive analysis. Second, while citation frequency is often used as a measure of academic influence, it does not necessarily equate to positive evaluations. Citations can also be made for criticism or rebuttal. Therefore, supplementing bibliometric analysis with content analysis would offer a more accurate reflection of the literature’s true impact. Finally, bibliometric analysis inherently concentrates on published research, which may not capture the most current research trends and developments.
ConclusionsThis study delves into the current landscape of AI applications in medical education, encompassing the geographical distribution of research efforts, recognition of pivotal researchers, identification of key research trends, and exploration of emerging domains. There is a burgeoning interest in AIME and an expanding comprehension of its potential impact. The United States has emerged as a leader in this field, with many institutions standing out as prolific organizations. As the demand for more personalized and effective medical education grows, there is a pressing need for large-scale, rigorously designed studies to provide empirical evidence of AI’s effectiveness and safety.
This work is financially supported by grants from the Teaching and Research Project of the Second Clinical College of Tongji Medical College (TJXYJ2023016 and TJSZ2024016). During the preparation of this work, the authors used Acadwrite [] to improve readability, grammar, and language style. After using this tool, the authors reviewed and edited the content as needed and took full responsibility for the content of the publication.
All data generated or analyzed during this study are included in this published article.
RL designed the study and wrote the initial draft of the manuscript. TW supervised the study, contributed to manuscript review and editing, and provided the funding. All authors contributed to the methodology and data visualization. All authors have read and agreed to the published version of the manuscript.
None declared.
Edited by T de Azevedo Cardoso; submitted 29.06.24; peer-reviewed by W Yang, W Qi, Y Jiang; comments to author 19.09.24; revised version received 04.11.24; accepted 26.11.24; published 30.01.25.
©Rui Li, Tong Wu. Originally published in the Interactive Journal of Medical Research (https://www.i-jmr.org/), 30.01.2025.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Interactive Journal of Medical Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.i-jmr.org/, as well as this copyright and license information must be included.
留言 (0)