May the effect you see, when you spell it out, be not a result of “seeing” token...

roywiggins · 2024-07-26T12:50:56 1721998256

It's more that it's liable to struggle to guess how to spell tokens [10295, 947] (or whatever it is) since there's no a priori reason that it will learn to associate them with the exact right tokens for the individual letters in the right order. If it's trained on bytes though, it doesn't need to infer that. It's like asking a smart, semi-literate person a spelling question- they might have a rough sense of it but they will not be very good at it.

Once it is just counting lists then it's probably drawing on a higher level capability, yeah.