GPT-4 performance comparable with physicians on official medical board residency examinations. Model performance near or above official passing rate in all medical specialties tested

cyu@sh.itjust.works · 2 years ago

GPT-4 performance comparable with physicians on official medical board residency examinations. Model performance near or above official passing rate in all medical specialties tested

theluddite@lemmy.ml · edit-2 2 years ago

Researchers reduced [the task] to producing a plausible corpus of text, and then published the not-so-shocking results that the thing that is good at generating plausible text did a good job generating plausible text.

From the OP , buried deep in the methodology :

Because GPT models cannot interpret images, questions including imaging analysis, such as those related to ultrasound, electrocardiography, x-ray, magnetic resonance, computed tomography, and positron emission tomography/computed tomography imaging, were excluded.

Yet here’s their conclusion :

The advancement from GPT-3.5 to GPT-4 marks a critical milestone in which LLMs achieved physician-level performance. These findings underscore the potential maturity of LLM technology, urging the medical community to explore its widespread applications.

It’s literally always the same. They reduce a task such that chatgpt can do it then report that it can do to in the headline, with the caveats buried way later in the text.

GPT-4 performance comparable with physicians on official medical board residency examinations. Model performance near or above official passing rate in all medical specialties tested

GPT-4 performance comparable with physicians on official medical board residency examinations. Model performance near or above official passing rate in all medical specialties tested

GPT versus Resident Physicians — A Benchmark Based on Official Board Scores | NEJM AI