Hacker News new | past | comments | ask | show | jobs | submit login

There is however a subfield of statistical ML of model uncertainty quantification. I've developed a product by applying to it to LLMs that can score the trustworthiness of any LLM response. Like any ML-based product, my tool is not perfect, but it can detect incorrect LLM responses with pretty high precision/recall across applications spanning RAG / Q&A, data extraction, classification, summarization, ...

I've published extensive benchmarks: https://cleanlab.ai/blog/trustworthy-language-model/

You can instantly play with an interactive demo: https://tlm.cleanlab.ai/




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: