Funnyish story: the other night I asked my Pixel 9 to generate an image via Gemini, then I asked it to make a change. It didn't consider the previous context, so I asked it "Are you capable of keeping context?" No matter how clearly I enunciated "context", it always interpreted what I was saying as "contacts." After the 4th try, I said "context, spelled "c-o-n-t-e-x-t" and it replied with "Ah, you meant context! Yes..."
I think google is digging a hole for themselves by making their lightweight models be the most used model. Regardless of what their heavy weight models can do, people will naturally associate them with their search model or assistant model.
I noticed Gemini Flash 2.0 making a lot of phonetic typos like that, yeah. Like instead of Basal Ganglia it said Basil Ganglia.
I've also had it switch languages in the middle of output... like one word in the middle of a sentence was randomly output in some strange hieroglyphs, but when I translated them, it was the right word and the sentence made sense.
I was using the conversational feature of Gemini on my phone the other night and was trying to get it to read a blog post to me. The AI proceeded to tell me (out loud, via voice mode/speech synthesis) that it was a text based model and couldn't read text out loud.
For as amazing as these things are, AGI they are not.
This stuff has a long way to go.