Hacker News new | past | comments | ask | show | jobs | submit login

Buying a book to read and incorporating their text in a product are two different things. Even if they bought the book, imo it would be illegal.



There are situations where you are allowed to incorporate the text in your product (fair use).

The million dollar question is if this counts.


Perhaps, but by not even buying the book they’ve conceded the point.

IMO copyright law does not control what you can do with a book once you’ve bought a license, except for reproduction. It’s arguable that LLMs engage in illegal distribution, but that’s a totally different question from whether simple training is illegal even if the model is never made available to anyone.


Maybe it is, maybe it isn't. The courts will decide.


> Maybe it is, maybe it isn't. The courts will decide.

This offhandedly seems to dismiss the cost of achieving legal clarity for using a book - a cost that will far eclipse the cost of the book itself.

In that light, it seems like an underweighted statement.


What they will decide is that it is simultaneously not piracy because it is not read by a human and not copyright infringement because its just like a human learning by reading a book


Those are both copyright infringement, sice we already have MAI Systems Corp. v. Peak Computer, Inc.

I'd like to see them try to argue Cartoon Network, LP v. CSC Holdings, Inc. applies to their corpus.


I really hope you'll be right -

But the first one is a human using things. Its big guy vs little guy.

The prescident is there, google already "reads" every page in the internet and injests it into its systems and has for decades and has survived lawsuits to do so.


How does MAI Systems Corp v. Peak Computer, InC c apply here, at all?

Peak was using MAI operating system directly by live booting it without their permission.

Antivirus and security companies don't need licenses to scan copyrighted materials to look for threats or vulnerabilities.

AI similarly is not executing, deploying, reselling or redistributing the copyrighted material. It's using the data to build a model. Security software distills the down data more, but it's still the same principle.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: