Hacker News new | past | comments | ask | show | jobs | submit login

If you give LLMs the letters one a time they often count them just fine, though Claude at least seems to need to keep a running count to get it right:

"How many R letters are in the following? Keep a running count. s t r a w b e r r y"

They are terrible at counting letters in words because they rarely see them spelled out. An LLM trained one byte at a time would always see every character of every word and would have a much easier time of it. An LLM is essentially learning a new language without a dictionary, of course it's pretty bad at spelling. The tokenization obfuscates the spelling not entirely unlike how verbal language doesn't always illuminate spelling.




May the effect you see, when you spell it out, be not a result of “seeing” tokens, but a result of the fact that a model learned – at a higher level – how lists in text can be summarized, summed up, filtered and counted?

Iow, what makes you think that it’s exactly letter-tokens that help it and not the high-level concept of spelling things out itself?


It's more that it's liable to struggle to guess how to spell tokens [10295, 947] (or whatever it is) since there's no a priori reason that it will learn to associate them with the exact right tokens for the individual letters in the right order. If it's trained on bytes though, it doesn't need to infer that. It's like asking a smart, semi-literate person a spelling question- they might have a rough sense of it but they will not be very good at it.

Once it is just counting lists then it's probably drawing on a higher level capability, yeah.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: