‘1984 is 1 token. 1884 is 2 tokens.’ I would be surprised if they use this token...

PeterisP · on April 6, 2023

They do use this tokenization, and that's the reason why these models sometimes struggle with tasks like "how many twos does this long number contain" and things like "is 50100 greater than 50200" as it tries to compare "501"/"00" with "50"/"200" while knowing that "501" is greater than "50".

The models aren't optimized to be math friendly. They could be, but the major big generic ones weren't.

sharkjacobs · on April 5, 2023

It's a language model, not a mathematical model