This is impressive. The next step is to see how well it generalizes outside of s... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

nopinsight 7 months ago | parent | context | favorite | on: Radiology-specific foundation model

This is impressive. The next step is to see how well it generalizes outside of such tests.

"The Fellowship of the Royal College of Radiologists (FRCR) 2B Rapids exam is considered one of the leading and toughest certifications for radiologists. Only 40-59% of human radiologists pass on their first attempt. Radiologists who re-attempt the exam within a year of passing score an average of 50.88 out of 60 (84.8%).

Harrison.rad.1 scored 51.4 out of 60 (85.67%). Other competing models, including OpenAI’s GPT-4o, Microsoft’s LLaVA-Med, Anthropic’s Claude 3.5 Sonnet and Google’s Gemini 1.5 Pro, mostly scored below 30*, which is statistically no better than random guessing."

rafram 7 months ago [–]

Impressive, but was it trained on questions from the exam? Were any of those other models?

aengustran 7 months ago | [–]

harrison.rad.1 was not trained on any of the exam questions. It can't be guaranteed however that other models were not trained on them though.

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact