Hacker News new | past | comments | ask | show | jobs | submit login

But Llama 4 Scout does badly on long context benchmarks despite claiming 10M. It scores 1 slot above Llama 3.1 8B in this one[1].

[1] https://github.com/adobe-research/NoLiMa




Indeed, but it does not take away the fact that long context is not trained through long content but by scaling short content instead.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: