The emergence of pre-trained generative transformers, notably ChatGPT, has profoundly influenced medical education globally. The integration of artificial intelligence (AI) to elevate the quality of education has garnered significant attention,1 2 with students readily embracing ChatGPT in their learning endeavours.3 A multitude of research has been initiated to investigate the potential applications of ChatGPT in medical education,4 5 predominantly focusing on case teaching, examinations, personalised learning and decision-making processes.6–13 In these scenarios, which typically have structured formats such as defined teaching syllabi,14 systematic case analysis15 16 and established exam question bank,17 ChatGPT primarily functions as an advanced search engine, using its vast knowledge base and logical precision, but not an AI educator.
However, medical education encompasses various non-standardised scenarios, like career planning education and counseling18 and mental health support, which lack rigid educational frameworks and are susceptible to the influence of students’ backgrounds and educators’ skill levels.19 In these contexts, the explorative and innovative aspects of ChatGPT could be more effectively harnessed to improve teaching quality and efficiency. The capability of ChatGPT in navigating these non-standardised educational scenarios and the appropriate methodologies for its application in such settings remain largely unexplored.
Therefore, the purpose of this study is to explore how ChatGPT-4o can be applied in non-standardised scenarios of medical education to improve the quality of education. Specifically, the approach involves introducing ChatGPT-4o into career development counselling for medical students, evaluating its role in this context and examining the differences between ChatGPT-4o and human counsellors. While numerous studies have already assessed ChatGPT’s capabilities in standardised medical examinations, its potential in non-standardised scenarios within medical education has not yet been fully investigated, which constitutes the innovative aspect of this study.
Research hypothesesChatGPT-4o, with its extensive knowledge base, rigorous logic and innovative capabilities, can independently conduct career planning consultations for medical students. It is expected to be more effective in providing data and research-based factual information than human educators, while human educators excel in offering personalised advice and emotional support. The collaborative model of ChatGPT-4o and human educators (such as joint consultations) is hypothesised to be more effective than using ChatGPT-4o or human educators alone, better assisting students in their career planning.
Research questionsHow does ChatGPT-4o perform in terms of accuracy and appropriateness of advice in career planning consultations for medical students, and how does it compare to human medical career planning consultations? In the context of career planning consultations for medical students, how do ChatGPT-4o and human educators differ in providing comprehensive information, personalised advice and empathy? What are the differences in satisfaction, decision-making confidence, practicality of planning and knowledge gain among medical students who receive consultations solely from human educators, solely from ChatGPT-4o and from both human educators and ChatGPT-4o combined? Based on the needs of medical students and the effectiveness of consultations, what are the most suitable roles for ChatGPT-4o in career planning consultations for medical students, such as information provider, auxiliary tool or decision supporter?
Methodology and designThis study is led by West China Hospital of Sichuan University and employs a prospective mixed-methods approach,20 consisting of two parts. The first part is the ChatGPT-4o career planning capability assessment, the Delphi method is used to identify key career planning questions. The effectiveness of independent consultations by ChatGPT-4o and human educators is compared by collecting data through surveys and interviews. The main evaluation indicators include student satisfaction with the consultation and decision-making confidence. The second part is an exploration of collaborative models between ChatGPT-4o, human educators and medical students, different consultation combinations (ChatGPT-4o-only, human educators-only, joint consultation, etc.) are examined through team dynamics analysis and the Career Decision Self-Efficacy Scale—Short Form (CDSE-SF),21 22 to identify the best collaboration methods between ChatGPT and humans.
Public involvement statementParticipants include graduate students and faculty from the West China School of Medicine of Sichuan University. Graduate students consist of doctoral and master’s students, excluding international students due to significant differences in career planning. Faculty includes graduate supervisors and student counsellors from the college. Participants are recruited by sending invitation letters to students and tutors, with subsequent recruitment through snowball sampling. The invitation letters include an informed consent form, which participants are required to agree to before participation. Participants are primarily involved in the implementation phase of the study. Their participation includes face-to-face interviews, completing questionnaires and using the ChatGPT programme as part of the study. Feedback from participants will be integrated through their direct involvement in these activities. After the study concludes, the researchers will share the final results with participants in the form of a published paper. Although participants were not involved in the initial design or planning stages of the study, their contributions during the implementation will provide valuable insights into the practical application of ChatGPT in career counselling.
The first part: ChatGPT-4o career planning capability assessmentThe study subjects include ChatGPT-4o, medical graduate students, graduate supervisors, and counsellors. The sample consists of 20 medical graduate students and five teachers, including students from various grades and academic backgrounds, and teachers who serve as graduate supervisors and counsellors. These participants are recruited to ensure diversity and representativeness in the sample. The Delphi method is used to identify key career planning questions.23 24 Students and teachers engage in in-depth discussions to identify common career planning questions. Participants anonymously submit questions, and discussions continue until a consensus is reached, finalising 10 representative career planning questions. The study hypothesised a virtual student, whose background information was integrated into the Word document along with the 10 questions. The project staff used this Word document for text-based consultations with both the teachers and GPT-4.0. Question answering and anonymisation: the finalised 10 questions are answered by ChatGPT-4o, graduate supervisors and counsellors. The answers are anonymised by an independent team, creating three sets of answers: Set A, Set B and Set C. Answer scoring: the anonymised answer sets are distributed to graduate students, who score them based on various criteria, including comprehensiveness, accuracy, practicality, empathy, depth of knowledge, respect for privacy, listening skills, adaptability, problem-solving, counselling and teaching abilities. Results analysis: based on the graduate students’ scores, the responses from ChatGPT-4o, graduate supervisors and counsellors are analysed and compared with identify strengths and weaknesses, evaluating ChatGPT’s capability in career planning consultation for medical students. Control group: human teachers serve as a control group to assess ChatGPT-4o’s effectiveness compared with traditional consultation methods. Outcome measures: the primary outcome measure is the graduate students’ scoring of the answer sets.
The second part: exploring suitable collaborative methods between ChatGPT-4o and human educatorsIn the second part of the study, we explore suitable collaborative methods between ChatGPT-4o and human educators. The study subjects and sample size remain consistent with the first part. Four different career planning consultation models are explored: (1) students first consult ChatGPT-4o and then discuss the results with human educators; (2) students first consult human educators and then use ChatGPT-4o to refine the career planning advice; (3) human educators use ChatGPT-4o for preparatory consultation before advising students; and (4) tripartite group consultations involving human educators, students and ChatGPT. Students are assigned to different groups, each receiving one of the consultation models. With the assistance of the project staff, students and teachers interacted with ChatGPT-4.0 in text format. To assess the effectiveness of these models, the Career Decision Self-Efficacy Scale–Short Form (CDSE-SF) is used to measure students’ career decision self-efficacy and capability before and after consultation, with differences in scores compared across groups. Additionally, team dynamics are observed and analysed to understand the interaction and communication patterns in different models. An internal control is applied, using pre- and post-consultation comparisons for students to evaluate consultation effectiveness. Intervention measures vary by experimental group, providing students with different forms of career planning consultation. The primary outcome measure is the difference in CDSE-SF scores before and after consultation, while qualitative data collected during the consultation process is analysed using a framework analysis to evaluate collaboration effectiveness and model performance.
Data collection processIn the initial phase of the study, we implement the Delphi method through a series of in-person meetings, where a focused discussion with experts from relevant fields identifies the ten most significant questions in career planning and hypothesises a background profile of a student who will present these questions to university educators. These questions and the student’s background information are compiled into a Word document and emailed to the participating educators. The educators fill out the document with their answers and return it via email. For ChatGPT’s responses, the researchers input the background information and questions into the ChatGPT-4o.0 system and then carefully format the received responses into a similar Word document.
The evaluation of these responses is conducted through an anonymous scoring system. The anonymised answer sets are distributed to the participating students via email, who then assess and score the responses based on predefined criteria. The researchers collect these scores and compile them into an Excel file for comprehensive analysis.
In the second phase of the study, the researchers organise career planning consultation sessions, which include both students and human educators, along with ChatGPT-4o. These sessions are recorded for detailed analysis, and the dialogues with ChatGPT-4o are also preserved for further examination. Students complete electronic questionnaires before and after the consultation to measure any changes in their perspectives and understanding. After the conclusion of the consultations, the researchers compile all the recorded materials and transcribe them into Word documents, preparing them for the subsequent stages of analysis. Since this study is being conducted in China, all the materials mentioned earlier are in Chinese, and the interactions between the staff and ChatGPT will also be conducted in Chinese.
Data analysisQuantitative Data Analysis. In this study, the collected quantitative data will be coded and organised using Excel, and statistical analysis will be performed using SPSS (version 27.0). For the first part of the study, where students score the responses from ChatGPT-4.0, graduate supervisors and counsellors, means and SD will be calculated to describe the distribution of scores for each group. A one-way analysis of variance (ANOVA) will be used to compare the differences between the three groups (ChatGPT-40, graduate supervisors and counsellors). If significant differences are detected by the ANOVA, post hoc tests (eg, LSD or Tukey) will be conducted to perform pairwise comparisons and determine specific group differences. For the second part of the study, exploring the effectiveness of different career planning consultation models, changes in students’ Career Decision Self-Efficacy Scale–Short Form (CDSE-SF) scores before and after consultation will be analysed using paired-sample t-tests to assess the impact of each consultation model. Additionally, independent-sample t-tests will be conducted to compare changes in CDSE-SF scores between different consultation model groups to evaluate the relative effectiveness of each model. Qualitative Data Analysis. For qualitative data analysis, the collected data will be organised into Word documents following a standardised format and then coded. A framework analysis approach will be employed to analyse the feedback provided by participants during the consultation process and the observed team interactions. This method will help identify key themes, team dynamics and the effectiveness of collaboration under different consultation models. In cases where there are discrepancies or disagreements in the data, group discussions will be held to reach a consensus and ensure the accuracy of the analysis. Control and Comparison. In the first part of the study, human educators will serve as the control group, allowing for a comparison of the effectiveness of ChatGPT-4.0 with traditional career counselling methods. The analysis will focus on comparing the scores assigned by students to the responses from each group. In the second part, pre- and post-consultation changes in CDSE-SF scores will serve as an internal control, helping to evaluate the effectiveness of each consultation model. Presentation of Results. The quantitative results will be presented as means, SD and significance levels (p values), with statistical significance set at p<0.05 for all analyses. The qualitative findings will be presented in the form of themes and frameworks,25 highlighting differences in collaboration models and their impact on career planning consultation effectiveness. All results will be interpreted in the context of the study’s objectives, providing a comprehensive discussion of the potential and limitations of ChatGPT-4.0 in medical students’ career planning consultations. Through these methods, this study will comprehensively evaluate the effectiveness of collaboration between ChatGPT-4.0 and human educators and explore its potential applications in career planning consultations.
DiscussionChatGPT, with its powerful functionality and simple operation,26 has greatly advanced the development of AI and had a significant impact on medical education, sparking academic interest in its application in this field. At present, research on the application of ChatGPT in education primarily focuses on standardised scenarios such as exams and teaching assistance.27 These scenarios are relatively structured because exams rely on large question banks, and answers to teaching-related queries can often be found online, yielding results similar to those produced by large language models through search engines. However, there are many non-standardised scenarios in education, such as career planning for medical students, where each student’s situation is unique, making it difficult to find answers online that fully match an individual student’s needs. Currently, career planning for medical students is handled by university teachers who communicate with students to understand their situation and then use their professional knowledge to provide personalised advice. These non-standardised scenarios are where language models should ideally demonstrate their potential. However, there is currently a noticeable gap in research on applying ChatGPT to career planning education for medical students. This study aims to contribute to understanding how ChatGPT-4o can be effectively applied in educational areas that lack a fixed model, such as career planning consultation and mental health counselling, thereby advancing the development of such educational fields.
Using ChatGPT in non-standardised educational scenarios also presents certain challenges, such as issues related to student privacy, data security, error rates in large language models and legal concerns.28 The capabilities of large language models depend on the datasets used to train them. At present, much of the data used for training large models is in English, and this data is generated by English-speaking populations. However, there are significant cultural differences between people who speak different languages, and it is worth considering whether large language models trained on English data are applicable to other cultural groups.
The successful completion of this study will elucidate the capabilities of ChatGPT-4o in handling educational scenarios like career planning consultations, thereby expanding its application in the realm of medical education. It will enable human educators, such as graduate advisors and counsellors, to understand the strengths of both ChatGPT-4o and human educators in these contexts. For instance, ChatGPT-4o’s extensive knowledge base can provide systemically integrated consultation advice based on various educational theories and student-specific situations, while it may lack in areas like empathy and interpreting body language, where human educators excel.
Furthermore, this study will offer practical recommendations on how human educators can use ChatGPT-4o in career planning consultations for medical students. This guidance will empower graduate supervisors and counsellors to effectively and accurately engage with ChatGPT-4o in career planning consultations with students, eliminating the need for them to invest time and effort in figuring out how to best leverage ChatGPT-4o.
Strengths and limitationsIn this study, we innovatively explored the use of ChatGPT-4o in the career development counselling of medical students, pioneering its application in a field where AI’s role is emergent. The mixed-methods approach enriched our design, offering a comprehensive view of both quantitative outcomes and qualitative experiences. However, the study’s context, focused on a single educational institution, may limit the generalisability of our results to other settings. Also, the rapidly evolving nature of AI technology and the complexities inherent in human–AI interaction within educational environments present challenges in capturing the full scope of these dynamics, potentially impacting the long-term applicability and interpretation of our findings.
留言 (0)