Hacker News new | past | comments | ask | show | jobs | submit login

I just tried it for a problem I solved in Azure Data Explorer and it solved it by making up some APIs that don't exist. It got close to how I solved the problem but cheated even with Expert mode enabled.



Seems like accuracy is the next killer feature for LLM search and teaching, will try again in 6 months


What a time to be alive where we likely need wait only a few months for the next big hurdle to be accomplished.

Exhilarating and terrifying at the same time.


I dunno about that in this case. The "confidently incorrect" problem seems inherent to the underlying algorithm to me. If it were solved, I suspect that would be a paradigm shift of the sort that happens on the years scale at best.


Yes, the "confidently incorrect" issue will be a tough nut to crack for the current spate of generative text models. LLMs have no ability to analyze a body of text and determine anything about it (e.g. how likely it is to be true); they are clever but at bottom can only extrapolate from patterns found in the training data. If no one has said anything like "X, and I'm 78% certain about it", then it's tough to imagine how an LLM could generate reasonably correct probability estimates.


What you're alluding to is calibration and base gpt-4 had excellent calibration before RlHF.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: