Hacker News new | past | comments | ask | show | jobs | submit login

This is impressive. The next step is to see how well it generalizes outside of such tests.

"The Fellowship of the Royal College of Radiologists (FRCR) 2B Rapids exam is considered one of the leading and toughest certifications for radiologists. Only 40-59% of human radiologists pass on their first attempt. Radiologists who re-attempt the exam within a year of passing score an average of 50.88 out of 60 (84.8%).

Harrison.rad.1 scored 51.4 out of 60 (85.67%). Other competing models, including OpenAI’s GPT-4o, Microsoft’s LLaVA-Med, Anthropic’s Claude 3.5 Sonnet and Google’s Gemini 1.5 Pro, mostly scored below 30*, which is statistically no better than random guessing."




Impressive, but was it trained on questions from the exam? Were any of those other models?


harrison.rad.1 was not trained on any of the exam questions. It can't be guaranteed however that other models were not trained on them though.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: