I don't think there is actually a law anywhere that says you need to obtain the rights to copyright material to read/view them. The person or organisation showing it to you, which might be yourself, needs to have a license. Otherwise things like libraries couldn't exist and you wouldn't be allowed to lend books or even have books in your house that other family members can read.
Not saying that particularly impacts your argument about OpenAI, because an LLM in training is not a person. It is transforming data from one format to another for later consumption by people. Therefore they probably would need a license.
I mean, look at it this way. Let's say you purchase a Woody Allen film on DVD. Will anyone seriously prosecute you for watching it at home together with your friends? No, that falls within normal usage. But let's say you now organise a local watching event with the same DVD for 200 people in a hall somewhere, and charge everyone, whatever, $6 - just to cover the hall expenses. Will you be prosecuted? Very likely. Libraries are probably under some sort of "fair use" regulation due to public interest and such. They don't quite generate profit with their line of work - nor should they!
Right, but those 200 people won't be prosecuted for watching it, which was my point. The example I was thinking about when posting would be putting up a copy of copyright art in a public place. The people in the public place are not breaking the law by looking at it, only the person who placed it... well even then, would the workers who put it up be liable? Probably not, it's not reasonable for someone who puts up billboards to check the copyright license.
I do agree with this example in general. But I guess from my point of view, the OpenAI comes across more like the person enabling the use of copyrighted art, and would thus be subject to copyright regulations. Their users I'd see rather as the people viewing the art in public, perhaps unaware of the copyright restrictions. But it also seems like these discussions in themselves are a bit of distraction. If the LLMs worked exactly as they are being hyped up for the third year now, I think we all would get behind the effort. Who would care about copyrights if a magic machine could lead us into the so-called post-scarcity world, right? But sadly it does not appear to be nowhere near that goal, nor will it be, based on what we know about how the technology works. So here we are, discussing if mechanical parrots should read our books :)
Not saying that particularly impacts your argument about OpenAI, because an LLM in training is not a person. It is transforming data from one format to another for later consumption by people. Therefore they probably would need a license.