Hacker News new | past | comments | ask | show | jobs | submit login

‘1984 is 1 token. 1884 is 2 tokens.’

I would be surprised if they use this tokenization still as it’s not math friendly.




They do use this tokenization, and that's the reason why these models sometimes struggle with tasks like "how many twos does this long number contain" and things like "is 50100 greater than 50200" as it tries to compare "501"/"00" with "50"/"200" while knowing that "501" is greater than "50".

The models aren't optimized to be math friendly. They could be, but the major big generic ones weren't.


It's a language model, not a mathematical model




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: