Neurosurgery residents outperform ChatGPT in answering board examination-like questions

15 Oct 2025

Large language models such as ChatGPT have been used in different fields of medical education. This study aimed to review the literature on the performance of ChatGPT in neurosurgery board examination-like questions compared to neurosurgery residents.

A literature search was performed following PRISMA guidelines, covering the time period from ChatGPT’s inception (November 2022) to October 25, 2024. Two reviewers screened for eligible studies, selecting those that utilized ChatGPT to answer neurosurgery board examination-like questions and compared the results with those of neurosurgery residents. Risk of bias was assessed using the JBI critical appraisal tool. Overall effect sizes and 95% confidence intervals were determined using a fixed-effects model with alpha at 0.05. After screening, six studies were selected for qualitative and quantitative analysis. The accuracy of ChatGPT ranged from 50.4 to 78.8%, compared to residents’ accuracy of 58.3 to 73.7%. Risk of bias was low in 4 out of 6 studies reviewed; the rest had moderate risk. There was an overall trend favoring neurosurgery residents versus ChatGPT (p < 0.00001), with high heterogeneity (I2 = 96). These findings were similar in the subgroup analysis of studies that used the Self-assessment in Neurosurgery (SANS) examination questions. However, on sensitivity analysis, the removal of the highest weighted study skewed the results toward better performance of ChatGPT.

Our meta-analysis showed that neurosurgery residents performed better than ChatGPT in answering neurosurgery board examination-like questions, although reviewed studies had high heterogeneity. Further improvement is necessary before it can become a useful and reliable supplementary tool in the delivery of neurosurgical education.

Authors: Edgar Dominic A. Bongco, Sean Kendrich N. Cua, Mary Angeline Luz U. Hernandez, Juan Silvestre G. Pascual and Kathleen Joy O. Khu (Division of Neurosurgery, Department of Neurosciences, College of Medicine and Philippine General Hospital, University of the Philippines Manila)

Read full article: https://link.springer.com/article/10.1007/s10143-024-03144-y

Photo by Viktor Duks from Pexels

Researchers pioneer the use of topological data analysis for Filipino Sign Language alphabet recognition

The risk of measles is higher on warmer days

Ang moral na ligalig ng Agosto-Setyembre taong 1995 ay nakasandal sa ilang istereotipo sa mga rakistang banda

Curriculum and Instruction

Faculty Development

Research and Innovation

Quality Assurance

Curriculum & Instruction

Faculty Development

Quality Assurance

Four OVPAA units are transitioned to newly created OVPRI

UP students win at the APRU x Google Tech Policy Hackathon 2025

UP holds system-wide summit to assess and advance PhD mentoring and research initiatives

Neurosurgery residents outperform ChatGPT in answering board examination-like questions

15 Oct 2025

Offices under the OVPAA

Researchers pioneer the use of topological data analysis for Filipino Sign Language alphabet recognition

The risk of measles is higher on warmer days

Ang moral na ligalig ng Agosto-Setyembre taong 1995 ay nakasandal sa ilang istereotipo sa mga rakistang banda

Curriculum and Instruction

Faculty Development

Research and Innovation

Quality Assurance

Curriculum & Instruction

Faculty Development

Quality Assurance

Four OVPAA units are transitioned to newly created OVPRI

UP students win at the APRU x Google Tech Policy Hackathon 2025

UP holds system-wide summit to assess and advance PhD mentoring and research initiatives

Neurosurgery residents outperform ChatGPT in answering board examination-like questions

15 Oct 2025

Neurosurgery residents outperform ChatGPT in answering board examination-like questions