Openai is not running solid five minutes of LLM compute per request. I know they are not profitable and burn money even on normal request, but this would be too much even for them.
Likely they throttle and do a lot of waiting for nothing during those five minutes. Can help with stability and traffic smoothing (using "free" inference during times the API and website usage drops a bit), but I think it mostly gives the product some faux credibility - "research must be great quality if it took this long!"
They will cut it down by just removing some artificial delays in few months to great fanfare.
Well you may be right. But you can turn on the details and see that it seems to pull data, evaluate it, follow up on it. But my thought was: Why do I see this in slow motion? My home made Python stuff runs this in a few seconds, and my bottleneck is the API of the sites I query. How about them.
When you query some APIs/scrape sites for personal use, it is unlikely you get throttled. Openai doing it at large scale for many users might have to go slower (they have tons of proxies for sure, but don't want to burn those IPs for user controlled traffic).
Similarly, their inference GPUs have some capacity. Spreading out the traffic helps keep high utilization.
But lastly, I think there is just a marketing and psychological aspect. Even if they can have the results in one minute, delaying it to two-five minutes won't impact user retention much, but will make people think they are getting a great value.
Likely they throttle and do a lot of waiting for nothing during those five minutes. Can help with stability and traffic smoothing (using "free" inference during times the API and website usage drops a bit), but I think it mostly gives the product some faux credibility - "research must be great quality if it took this long!"
They will cut it down by just removing some artificial delays in few months to great fanfare.