Hacker News new | past | comments | ask | show | jobs | submit login

This is true, but how would pollution work for a benchmark designed to test hallucinations?



A dataset of labelled answers that are hallucinations and not hallucinations are published based on the benchmark as part of a paper.

People _seriously_ underestimate just how much stuff is online and how much impact it can have on training.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: