If this legal theory is correct and upheld, possession of data will be pretty important. So all those data you licensed for one reason or another might be able to be used to train.
I think the logic makes sense because imagine if humans were prevented from getting ideas from watching movies. It seems similar to not letting AI watch every movie ever and learn.
It sounds like to get the data into their AI, Google had to make copies. They digitized the physical books, and then trained using those copies they made. Also, their system included excerpts from the books which it could retrieve and show users.
Hence, they had to use fair use to justify it.
I think if you could train the AI without having to make a copy first, such as having the AI read the physical books directly, or in the case of your movie example having the AI watch the movie on a TV hooked up to a DVD player playing a copy of the movie on DVD that you bought from a retailer authorized by the copyright owner to sell such DVDs, you might not even need to make a fair use argument.
The definitions section of the US copyright statutes, 17 USC 101 [1], defines "copies" like this:
> “Copies” are material objects, other than phonorecords, in which a work is fixed by any method now known or later developed, and from which the work can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device. The term “copies” includes the material object, other than a phonorecord, in which the work is first fixed.
and "fixed" is defined like this:
> A work is “fixed” in a tangible medium of expression when its embodiment in a copy or phonorecord, by or under the authority of the author, is sufficiently permanent or stable to permit it to be perceived, reproduced, or otherwise communicated for a period of more than transitory duration.
An AI reading or watching the work as one of many many works in order to learn weights for a neural net does not result in a material object from which the work can be perceived, reproduced, or otherwise communicated. Thus, there is no copy, and hence no copyright issue.
"It sounds like to get the data into their AI, Google had to make copies"
It seems to me that "copy" is a legal term of art. Like, if I view a website, it might say I'm not allowed to copy the information. But depending on the level of abstraction, the data has been copied many times by many entities just to get to me, all the layers of machines, caches, retransmissions, etc. and exactly what I do with the normal functions of my browser cause more copies to be made.
Either this is not considered copying or it is considered fair use, but it seems pretty arbitrary to me, except that obviously considering it infringement would not advance the constitutional purpose of IP protections.
> "does not result in a material object from which the work can be perceived"
This does not hold true in all cases. Note that the ruling lists the end goals as 'fair use' goals and that that seems to have been an important part in the conclusion reached.
The key thing to strive for in creating derivative works that are deserving of copyright protection in their own right is that they contain 'substantial originality', mere machine transformation does not qualify.
> I think if you could train the AI without having to make a copy first, such as having the AI read the physical books directly
Presumably this would require a digital camera capturing a copy of the book and storing it for some amount of time in computer memory, which seems equivalent to copying a digital version of a book or movie into the AI's computer memory.
The "period of more than transitory duration" makes all the difference. If you digitize the page for just long enough to run the model on it once and then delete it, that's different from storing it indefinitely to train all your future models with.
I think the logic makes sense because imagine if humans were prevented from getting ideas from watching movies. It seems similar to not letting AI watch every movie ever and learn.