Hacker News new | past | comments | ask | show | jobs | submit login

The technical report does go into a lot of depth about how they use RL, such as the modified GRPO objective they use. As far as the README, I imagine most people active in the field understand the implications of "RL" for a reasoning model.





Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: