> I'm pretty sure Llama itself trained on a bunch of copyrighted data.
Every good, "SotA" model is trained on copyrighted data. This fact becomes aparent when models are released with everything public (i.e. training data) and they score significantly behind in every benchmark.
Every good, "SotA" model is trained on copyrighted data. This fact becomes aparent when models are released with everything public (i.e. training data) and they score significantly behind in every benchmark.