This is true, but how would pollution work for a benchmark designed to test hall... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

brookst 72 days ago | parent | context | favorite | on: GPT-4.5

This is true, but how would pollution work for a benchmark designed to test hallucinations?

llm_trw 72 days ago [–]

A dataset of labelled answers that are hallucinations and not hallucinations are published based on the benchmark as part of a paper.

People _seriously_ underestimate just how much stuff is online and how much impact it can have on training.

Consider applying for YC's Summer 2025 batch! Applications are open till May 13
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact