Hacker News new | past | comments | ask | show | jobs | submit login

Anthropic has said they had benchmarks where Claude would take GitHub issues and try to generate Git commits that passed unit/integration tests that others had made for the real final feature. Also you have things like multimodal image recognition for UIs where you can say generate code for a UI that looks like such and such and then verify it with the multimodal capabilities.

Tool use means you can click a button and make sure it transitioned to the next described UI screen verified again with multimodal as well.




Are you sure you responded to the right comment? We were talking about code verification


I missed that it was about formal verification, but don't think formal verification is necessary for effective RL in the coding ___domain.


There's still an important to what I said, even in ML. In fact, consider if what I said is true then ask what that would mean for how the current status quo goes about showing things. Then think about AI safety lol




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: