> If you require licensing fees for training data, you kill open source ML. This...

sillysaurusx · on Feb 13, 2024

"Ethical" in this case is a matter of opinion. The whole point of copyright was to promote useful sciences and arts. It’s in the US constitution. You don’t get to control your work out of some sense of fairness, but rather because it’s better for the society you live in.

As an ML researcher, no, there’s basically no way to make progress without the data. Not in comparison with billion dollar corporations that can throw money at the licensing problem. Synthetic data is still a pipe dream, and arguably still a copyright violation according to you, since traditional models generate such data.

To believe that this problem will just go away or that we can find some way around it is to close one’s eyes and shout "la la la, not listening." If you want to kill open source AI, that’s fine, but do it with eyes open.

chasing · on Feb 13, 2024

Yes, it’s true that open source projects that cannot pay to license content owned by other people are at a disadvantage versus those who can. Open source projects cannot, for example, wholly copy code owned by other people.

Also, beware of originalist interpretations of the Constitution. I believe there’s been about 250 years of law clarifying how copyright works, and, not to beat a dead horse, I don’t think it carves out a special exception for open source projects.