Hacker News new | past | comments | ask | show | jobs | submit login

It's hard to envision a greater success for the "great academic project" than what happened. I mean, what else were they trying to accomplish?



It was meant to be an open-source compilation of the crawled internet so that research could be done on web search given how opaque Google's process is. It was NOT meant to be a cheap source of data for for-profit LLM's to train on.

*edit: added "for-profit"


(Shrug) Multiple not-for-profit LLMs have trained on it as well.

If something I worked on turned out to play a significant part in something that turned out to be that big a deal, I'd be OK with it. And nobody's stopping people from doing web-search studies with it, to this day.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: