One of the more exciting AI use-cases is that it should be about competent to handle the conversational parts of diagnosis; it should have read all the studies and so it'll be possible to spend an hour at home talking to an AI and then turn up at the doctor with a checklist of diagnostic work you want them to try.
A shorter amount of expensive time with a consultant is more powerful if there is a solid reference to play with for longer before hand.
AI has a long way to go before it can serve as a trustworthy middleman between research papers and patients.
For instance, even WebMD might waste more time in doctor's offices than it saves, and that's a true, hallucination-free source, written specifically to provide lay-people with understandable information.
This study found that an LLM outperformed doctors "on a standardized rubric of diagnostic performance based on differential diagnosis accuracy, appropriateness of supporting and opposing factors, and next diagnostic evaluation steps, validated and graded via blinded expert consensus."
If you look in the discussion section you'll find that wasn't exactly what the study ended up with. I'm looking at the paragraph starting:
> An unexpected secondary result was that the LLM alone performed significantly better than both groups of humans, similar to a recent study with different LLM technology.
They suspected that the clinicians were not prompting it right since the LLM without humans was observed to be outperforming the LLM with skilled operators.
Exactly - if even the doctors/clinicians are not "prompting it right," then what are the odds that the layperson is going to get it to behave and give accurate diagnoses, rather than just confirm their pre-existing biases?