But Llama 4 Scout does badly on long context benchmarks despite claiming 10M. It... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

christianqchung 22 days ago | parent | context | favorite | on: GPT-4.1 in the API

But Llama 4 Scout does badly on long context benchmarks despite claiming 10M. It scores 1 slot above Llama 3.1 8B in this one[1].

[1] https://github.com/adobe-research/NoLiMa

omneity 22 days ago [–]

Indeed, but it does not take away the fact that long context is not trained through long content but by scaling short content instead.

Consider applying for YC's Summer 2025 batch! Applications are open till May 13
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact