Neurosurgery residents outperform ChatGPT in answering board examination-like questions
15 Oct 2025

Large language models such as ChatGPT have been used in different fields of medical education. This study aimed to review the literature on the performance of ChatGPT in neurosurgery board examination-like questions compared to neurosurgery residents.
A literature search was performed following PRISMA guidelines, covering the time period from ChatGPT’s inception (November 2022) to October 25, 2024. Two reviewers screened for eligible studies, selecting those that utilized ChatGPT to answer neurosurgery board examination-like questions and compared the results with those of neurosurgery residents. Risk of bias was assessed using the JBI critical appraisal tool. Overall effect sizes and 95% confidence intervals were determined using a fixed-effects model with alpha at 0.05. After screening, six studies were selected for qualitative and quantitative analysis. The accuracy of ChatGPT ranged from 50.4 to 78.8%, compared to residents’ accuracy of 58.3 to 73.7%. Risk of bias was low in 4 out of 6 studies reviewed; the rest had moderate risk. There was an overall trend favoring neurosurgery residents versus ChatGPT (p < 0.00001), with high heterogeneity (I2 = 96). These findings were similar in the subgroup analysis of studies that used the Self-assessment in Neurosurgery (SANS) examination questions. However, on sensitivity analysis, the removal of the highest weighted study skewed the results toward better performance of ChatGPT.
Our meta-analysis showed that neurosurgery residents performed better than ChatGPT in answering neurosurgery board examination-like questions, although reviewed studies had high heterogeneity. Further improvement is necessary before it can become a useful and reliable supplementary tool in the delivery of neurosurgical education.
Authors: Edgar Dominic A. Bongco, Sean Kendrich N. Cua, Mary Angeline Luz U. Hernandez, Juan Silvestre G. Pascual and Kathleen Joy O. Khu (Division of Neurosurgery, Department of Neurosciences, College of Medicine and Philippine General Hospital, University of the Philippines Manila)
Read full article: https://link.springer.com/article/10.1007/s10143-024-03144-y
Photo by Viktor Duks from Pexels