RadLE in the wild
Community response
Posts and threads fetched live from X and Reddit — same sources as the homepage section, shown in full so you can browse every highlight.
On X
English-language posts (metrics update on refresh).
- Demis Hassabis@demishassabis · Nov 20
@rohanpaul_ai awesome to see!
41605.3K - Rohan Paul@rohanpaul_ai · Nov 20
Wow. Gemini 3.0 on Radiology's Last Exam The first time a general-purpose model has beaten radiology residents with 51% accuracy. Radiology trainees are at 45%. The main significance is that a general model has finally reached a level where it can compete with early-stage human training on a specialized medical exam. Congratulations to @GoogleDeepMind team. @GeminiApp
502041.7K234K - Dr. Datta M.D. (Radiology) M.B.B.S. 🇮🇳@DrDatta_AIIMS · Nov 20
🔥 Gemini 3.0 vs Radiologists: RadLE Benchmark Results Are OUT! ☠️ Is it game over for Radiology? Let us find out! ⬇️ 🫨 Since yesterday, Gemini 3.0 has been everywhere for crushing benchmarks. My inbox exploded asking: “But how did it do on the hardest visual reasoning benchmark in healthcare?” So we ran it! And here you go. 👇 ➡️ Gemini 3.0 Pro on RadLE v1: ✅ 51% accuracy; first time a general-purpose model has beaten radiology residents ✅ Radiology residents: 45% ✅ Board-certified radiologists: ~83% ✅ Shows clean step-by-step reasoning in some tough cases (appendix localization, mimics ruled out, etc.) 🚀 This is the first time ever that a generalist model has crossed the trainee bar on RadLE v1! Congratulations to @GoogleDeepMind and @Google team including @vivnat, @alan_karthi and all others for cooking this time! Full breakdown here: 🔗 Link in comments / bio 🔥 Huge shoutout to Lakshmi, Divya, Upasana, Hakikat, Kautik & the entire #CRASHLab team at @KCDH_A for turning around in under a day. 🙌 If you are a medical AI lab and want to improve your performances and want our expert insights, reach out!
751861.2K525K - Dr. Datta M.D. (Radiology) M.B.B.S. 🇮🇳@DrDatta_AIIMS · Oct 1
🚨 Just published! All frontier AI models have failed “Radiology’s Last Exam” - the toughest benchmark in radiology launched today! ✅ Board-certified radiologists scored 83%, trainees 45%, but the best performing AI from frontier labs, GPT-5, managed only 30%. ❌ These results shatter repeated claims of “doctor-level” AI in medicine and give you a reality check! 🇮🇳 The Centre for Responsible Autonomous Systems in Healthcare (#CRASHLab), @KCDH_A @AshokaUniv, India has launched v1 of one of the hardest benchmarks in medicine and we share our results with the world! 1/n
46127663202K - Simon Smith@_simonsmith · Nov 20
Here's a very practical real-world benchmark where Gemini 3 Pro shows dramatic progress: Radiology's Last Exam. A general AI model now beats trainee radiologists, with a 70% improvement over the previous best model (which was released in August!).
0161.1K - Healthcare AI Guy@HealthcareAIGuy · Nov 21
NEW: Gemini 3.0 Pro just passed radiology trainees on Radiology’s Last Exam (51% vs 45%) A general-purpose frontier model is now performing at the level of early-stage human training on a real medical imaging task.
34442.4K - Rohan Paul@rohanpaul_ai · Oct 2
Paper – https://arxiv.org/abs/2509.25559 Paper Title: "Radiology's Last Exam (RadLE): Benchmarking Frontier Multimodal AI Against Human Experts and a Taxonomy of Visual Reasoning Errors in Radiology"
665810K - Haider.@haider1 · Oct 4
"Radiology's Last Exam" — the toughest benchmark in radiology According to the paper: GPT-5 scored 30% with "substantial" consistency on 50 expert-level radiology cases across CT, MRI, and X-ray, performing best on MRI but still below humans surely it will be saturated by 2027
426255957K - Dominik Filkus@DominikFilkus · Nov 24
Fortunately, AI is not just about image, video creation or coding, it is here to help humanity against diseases or at least help recognize them with high precision. In Radiology's Last Exam (RadLE v1), Gemini 3 Pro was the first SOTA model which outperformed the trainees. Its score, after multiple runs was still far below the score of the certified radiologists but it's still a milestone. No GPT-5, Gemini 2.5 Pro, Grok or Claude models were capable of finishing with a higher score than the trainees before. At some point, AI will be better than humans in most areas and focusing on this specific case, it will make fewer mistakes or no mistakes at all in the future, hopefully.
103270
International coverage
Global audiences in Japanese, Hindi, German, and more.
- チェリ@AIエンジニア•メタAIインフルエンサー@rN1oO71GTPiEMks · Oct 5
AIIMSのDatta医師が放射線診断ベンチマーク「Radiology’s Last Exam」を公開し、最先端AIはいずれも不合格だったと報告しました。認定医83%、研修医45%に対し、GPT-5は30%、Gemini 2.5 Proは29%、Claude Opus 4.1は1%でした。CT・MRI・X線の難問50例で評価した結果です。 https://x.com/DrDatta_AIIMS/status/1973373655251038701
101412 - チェリ@AIエンジニア•メタAIインフルエンサー@rN1oO71GTPiEMks · Nov 20
Gemini 3.0 Proが、放射線診断の難関ベンチマーク「Radiology’s Last Exam(RadLE)」で放射線科研修医の平均スコアを上回った結果が紹介されています。一般向け汎用モデルと人間の専門家を同じ胸部画像問題で比較し、どのレベルまでAIが迫っているかを示すスレッドです。  https://x.com/DrDatta_AIIMS/status/1991378471604334604
000273 - حمید (شیرازی سودوفیکیک سابق)@pseudophakic_sh · Nov 20
بنچمارکRadiology’s Last Exam (RadLE) بنچمارکی در سطح امتحان بورد رادیولوژی که نشان داد مدلهای پیشتاز AI حتی از رزیدنت سال اول هم عملکرد ضعیف تری دارن + آپدیت آن برای Gemini 3.0 مقاله اولیه این تحقیق ۲ ماه پیش منتشر شد و اپدیت آن برای Gemini 3.0 امروز. #هوش_مصنوعی_و_پزشکی 🧵1/4
13131K - Chubby♨️@kimmonismus · Oct 4
Radiology’s last exam: human radiologists achieve about 83% accuracy, where as GPT-5 achieves ~30%. - for now. Let’s see if we get a updated GPT-5 version on Monday. Anyways, can’t imagine this benchmark will last longer than 6 months until saturated by AI.
324654778K