The FairTrained models claim to train with only public ___domain and legal works. Companies are also licensing works. This company has a lawful, foundation model:
So, it's really the majority of companies breaking the law who will be affected. Companies using permissible and licensed works will be fine. The other companies would finally have to buy large collections of content, too. Their billions will have go to something other than GPU's.
Not really sure a claim is good enough. I don't know that you can just go into court and say, "Trust me, I don't use copyrighted material."
And I also can't see any way, other than providing training data and training an identically structured model on that data, that a company can conclusively show that they got the weights in an allegedly copyright free model from the copyright free training data a company provides.
While the others are correct, I'm with you in the sense that I don't know if what they claim is true. I've also found others, like one in Singapore, that didn't use it on data that was as legal as news reports claimed. It might turn out to have problems.
There is benefit to using them, though. For one, they've tried really hard to be legal. That sets a positive example, shows good faith if they were sued, and reduces risk for those using them (good faith on our part). Also, one can be sure that they can ditch or replace any outputs in the long term if they're ruled illegal. So, we try not to use the A.I.'s in a way where losing access to them seriously damages our business.
That's the best I can offer until legal reforms happen.
If training, one can train it in Singapore on material you he or she has legal access to. Their law pretty much let's you use anything for AI purposes so long as you legally can access it yourself. To further reduce the risk, they should crawl it themselves, too, taking care to avoid risky sources.
Civil courts work by you proving damages (at least in the USA), not by you going on fishing expeditions because they "might" have done something.
So good luck finding the thing that looks exactly like your copyrighted work that's not in the corpus, if you can yeah, you might be able to prove it.
At the end of the day its like a lot of business, where a liability shell game is played out, and if the chain of evidence cant be drawn quite brightly then lawsuits would be frivolous at best.
https://273ventures.com/kl3m-the-first-legal-large-language-...
So, it's really the majority of companies breaking the law who will be affected. Companies using permissible and licensed works will be fine. The other companies would finally have to buy large collections of content, too. Their billions will have go to something other than GPU's.