Sorry, I've edited my original comment to be clearer. What I really meant is that there is wide tolerance of noise in those domains. "How long does stars last" has a completely different meaning than "How long do stars last" - not tolerant of noise.
If an 6th grader asks their science teacher "How long does stars last?" / "How long stars last?" /"How long do stars last?" / "How old do stars get?" / "Stars, how old can they get?" / ...
In similar context they probably end up parsed to the same question assuming correct inflection, posture, etc. Spoken conversations are messy, but they also have redundancy and pseudo checksum's. Written language tends to be more formal because it's a much narrower channel and you don't get as much feedback.
PS: It's also really common for someone to ask a question when they don't have enough context to understand what question they should be asking.
I'd suggest "How long do these stars last?" and "How long do these stairs last?" might be a better example. Human language has more redundancy than computer languages and in a real context it would probably still be clear what was meant even if the wrong word was used, but it's still a much spikier landscape with regard to small changes than images are.