They do use this tokenization, and that's the reason why these models sometimes struggle with tasks like "how many twos does this long number contain" and things like "is 50100 greater than 50200" as it tries to compare "501"/"00" with "50"/"200" while knowing that "501" is greater than "50".
The models aren't optimized to be math friendly. They could be, but the major big generic ones weren't.
I would be surprised if they use this tokenization still as it’s not math friendly.