Integrating large language models into medical education: A commentary on opportunities, challenges, and future directions

Authors

  • Leo Morjaria Michael G. DeGroote School of Medicine, McMaster University
  • Levi Burns Michael G. DeGroote School of Medicine, McMaster University
  • Keyna Bracken Michael G. DeGroote School of Medicine, McMaster University; McMaster Education Research, Innovation and Theory (MERIT) Program, McMaster University
  • Anthony J Levinson Michael G. DeGroote School of Medicine, McMaster University
  • Quang N Ngo Michael G. DeGroote School of Medicine, McMaster University; McMaster Education Research, Innovation and Theory (MERIT) Program, McMaster University
  • Mark Lee McMaster Education Research, Innovation and Theory (MERIT) Program, McMaster University
  • Matthew Sibbald Michael G. DeGroote School of Medicine, McMaster University; McMaster Education Research, Innovation and Theory (MERIT) Program, McMaster University

DOI:

https://doi.org/10.15173/mumj.v21i1.3674

Keywords:

ChatGPT, Medical Education, Artificial Intelligence, Large Language Models

Abstract

Shortly following its release in November 2022, OpenAI’s ChatGPT gained notoriety in the medical education community for its ability to perform at or near the passing threshold on the United States Medical Licensing Examination (USMLE). Although there is an overwhelming amount of excitement surrounding artificial intelligence (AI) and its potential to revolutionize medical training, this commentary seeks to explore both the opportunities and challenges posed by incorporating large language models (LLMs) such as ChatGPT into medical education. To evaluate ChatGPT’s impact in the context of problem-based learning (PBL) medical education, we conducted two initial studies. The first study assessed ChatGPT’s performance on concept application exercises (CAEs)—short-answer assessments used in our program to gauge student progress. After establishing ChatGPT’s performance on CAEs, our second study aimed to evaluate ChatGPT’s ability to effectively grade student-generated responses. Our results reveal that ChatGPT not only outperforms students who are marginally passing but also grades these assessments with promising alignment to human grading practices. Our team’s future research plans include examining ChatGPT's ability to provide effective feedback, generate discerning assessment questions, create realistic training scenarios, and support continuous professional development. Although we are optimistic about future applications of LLMs, we emphasize the need for an AI-assisted approach that employs human oversight to mitigate the inherent risks associated with LLMs, such as bias perpetuation, inaccuracies, over-reliance, and potential misuse. Ultimately, through the thoughtful and evidence-based implementation of these new tools, we believe AI can be harnessed to augment rather than undermine the quality and effectiveness of medical education.

Downloads

Published

2025-03-25