That's what I did. It came up with smart-sounding but infeasible recommendations because it took all sources it found online at face value without considering who authored them for what reason. And it lacked a massive amount of background knowledge to evaluate the claims made in the sources. It took outlandish, utopian demands by some activists in my field and sold them to me as things that might plausibly be implemented in the near future.
Real research needs several more levels of depth of contextual knowledge than the model is currently doing for any prompt. There is so much background information that people working in my field know. The model would have to first spend a ton of time taking in everything there is to know about the field and several related fields and then correlate the sources it found for the specific prompt with all of that.
At the current stage, this is not deep research but research that is remarkably shallow.
> It took outlandish, utopian demands by some activists in my field and sold them to me as things that might plausibly be implemented in the near future.
I’ve seen at least one deep-research replicator claiming they were the “best open deep research” tool on the GAIA benchmark: https://huggingface.co/papers/2311.12983
This is not a perfect benchmark but the closest I’ve seen.
Also, I'd compare with the output of phind (with thinking and multiple searches selected).