> I'm pretty sure Llama itself trained on a bunch of copyrighted data. Every goo... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

NitpickLawyer 18 days ago | parent | context | favorite | on: Breaking the Llama Community License

> I'm pretty sure Llama itself trained on a bunch of copyrighted data.

Every good, "SotA" model is trained on copyrighted data. This fact becomes aparent when models are released with everything public (i.e. training data) and they score significantly behind in every benchmark.

tough 16 days ago [–]

Research team from orielly found out openai trained on copyirghted books

prob got a sub...

https://ssrc-static.s3.us-east-1.amazonaws.com/OpenAI-Traini...

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact