Openai is not running solid five minutes of LLM compute per request. I know they...

submeta · 2025-02-16T08:42:25 1739695345

Well you may be right. But you can turn on the details and see that it seems to pull data, evaluate it, follow up on it. But my thought was: Why do I see this in slow motion? My home made Python stuff runs this in a few seconds, and my bottleneck is the API of the sites I query. How about them.

progbits · 2025-02-16T09:05:55 1739696755

When you query some APIs/scrape sites for personal use, it is unlikely you get throttled. Openai doing it at large scale for many users might have to go slower (they have tons of proxies for sure, but don't want to burn those IPs for user controlled traffic).

Similarly, their inference GPUs have some capacity. Spreading out the traffic helps keep high utilization.

But lastly, I think there is just a marketing and psychological aspect. Even if they can have the results in one minute, delaying it to two-five minutes won't impact user retention much, but will make people think they are getting a great value.