RadLE in the wild

Community response

Posts and threads fetched live from X and Reddit — same sources as the homepage section, shown in full so you can browse every highlight.

On X

English-language posts (metrics update on refresh).

Demis Hassabis
@demishassabis · Nov 20
@rohanpaul_ai awesome to see!
41605.3K
Rohan Paul
@rohanpaul_ai · Nov 20
Wow. Gemini 3.0 on Radiology's Last Exam The first time a general-purpose model has beaten radiology residents with 51% accuracy. Radiology trainees are at 45%. The main significance is that a general model has finally reached a level where it can compete with early-stage human training on a specialized medical exam. Congratulations to @GoogleDeepMind team. @GeminiApp
502041.7K234K
Dr. Datta M.D. (Radiology) M.B.B.S. 🇮🇳
@DrDatta_AIIMS · Nov 20
🔥 Gemini 3.0 vs Radiologists: RadLE Benchmark Results Are OUT! ☠️ Is it game over for Radiology? Let us find out! ⬇️ 🫨 Since yesterday, Gemini 3.0 has been everywhere for crushing benchmarks. My inbox exploded asking: “But how did it do on the hardest visual reasoning benchmark in healthcare?” So we ran it! And here you go. 👇 ➡️ Gemini 3.0 Pro on RadLE v1: ✅ 51% accuracy; first time a general-purpose model has beaten radiology residents ✅ Radiology residents: 45% ✅ Board-certified radiologists: ~83% ✅ Shows clean step-by-step reasoning in some tough cases (appendix localization, mimics ruled out, etc.) 🚀 This is the first time ever that a generalist model has crossed the trainee bar on RadLE v1! Congratulations to @GoogleDeepMind and @Google team including @vivnat, @alan_karthi and all others for cooking this time! Full breakdown here: 🔗 Link in comments / bio 🔥 Huge shoutout to Lakshmi, Divya, Upasana, Hakikat, Kautik & the entire #CRASHLab team at @KCDH_A for turning around in under a day. 🙌 If you are a medical AI lab and want to improve your performances and want our expert insights, reach out!
751861.2K525K
Dr. Datta M.D. (Radiology) M.B.B.S. 🇮🇳
@DrDatta_AIIMS · Oct 1
🚨 Just published! All frontier AI models have failed “Radiology’s Last Exam” - the toughest benchmark in radiology launched today! ✅ Board-certified radiologists scored 83%, trainees 45%, but the best performing AI from frontier labs, GPT-5, managed only 30%. ❌ These results shatter repeated claims of “doctor-level” AI in medicine and give you a reality check! 🇮🇳 The Centre for Responsible Autonomous Systems in Healthcare (#CRASHLab), @KCDH_A @AshokaUniv, India has launched v1 of one of the hardest benchmarks in medicine and we share our results with the world! 1/n
46127663202K
Simon Smith
@_simonsmith · Nov 20
Here's a very practical real-world benchmark where Gemini 3 Pro shows dramatic progress: Radiology's Last Exam. A general AI model now beats trainee radiologists, with a 70% improvement over the previous best model (which was released in August!).
0161.1K
Healthcare AI Guy
@HealthcareAIGuy · Nov 21
NEW: Gemini 3.0 Pro just passed radiology trainees on Radiology’s Last Exam (51% vs 45%) A general-purpose frontier model is now performing at the level of early-stage human training on a real medical imaging task.
34442.4K
Rohan Paul
@rohanpaul_ai · Oct 2
Paper – https://arxiv.org/abs/2509.25559 Paper Title: "Radiology's Last Exam (RadLE): Benchmarking Frontier Multimodal AI Against Human Experts and a Taxonomy of Visual Reasoning Errors in Radiology"
665810K
Haider.
@haider1 · Oct 4
"Radiology's Last Exam" — the toughest benchmark in radiology According to the paper: GPT-5 scored 30% with "substantial" consistency on 50 expert-level radiology cases across CT, MRI, and X-ray, performing best on MRI but still below humans surely it will be saturated by 2027
426255957K
Dominik Filkus
@DominikFilkus · Nov 24
Fortunately, AI is not just about image, video creation or coding, it is here to help humanity against diseases or at least help recognize them with high precision. In Radiology's Last Exam (RadLE v1), Gemini 3 Pro was the first SOTA model which outperformed the trainees. Its score, after multiple runs was still far below the score of the certified radiologists but it's still a milestone. No GPT-5, Gemini 2.5 Pro, Grok or Claude models were capable of finishing with a higher score than the trainees before. At some point, AI will be better than humans in most areas and focusing on this specific case, it will make fewer mistakes or no mistakes at all in the future, hopefully.
103270

International coverage

Global audiences in Japanese, Hindi, German, and more.