AFAIK they got 15% of unseen queries everyday, so it might be not very simple to design an effective cache layer on that. Semantic-aware clustering of natural language queries and projecting them into a cache-able low rank dimension is a non-trivial problem. Of course, LLM can effectively solve that, but then what's the point of using cache when you need LLM for clustering queries...
Not a search engineer, but wouldn’t a cache lookup to a previous LLM result be faster than a conventional free text search over the indexed websites? Seems like this could save money whilst delivering better results?
Yes, that's what Google's doing for AI overview IIUC. From what I've seen from my experiences, this is working okay and improving over time but not close to perfection. The results are stale for developing stories, some bad results are kept there for a long time, effectively same queries are returning different caches etc etc...