…why? Will people buy less books because we have intuitive algorithms trained on old books?
Personally, I strongly believe that the aesthetic skills of humanity are one of our most advanced faculties — we are nowhere close to replacing them with fully-automated output, AGI or no.
You got less than 1% of a book... from an author who has passed away... who wrote on a research topic that was funded by an institution that takes in hundreds of millions of dollars in federal grants each year...
I'm not an author (although I do generate almost exclusively IP for a living) and I think this is about as weak a form of this argument as you possibly make.
So right back at ya... who was hurt in your example?
I think the key is to think through the incentives for future authors.
As a thought experiment, say that the idea someday becomes mainstream that there is no reason to read any book or research publication because you can just ask an AI to describe and quote at length from the contents of anything you might want to read. In such a future, I think it's reasonable to predict that there would be less incentive to publish and thus less people publishing things.
In that case, I would argue the "hurt" is primarily to society as a whole, and also to people who might have otherwise enjoyed a career in writing.
Having said that, I don't think we're particularly close to living in that future. For one thing I'd say that the ability to receive compensation from holding a copyright doesn't seem to be the most important incentive for people to create things (written or otherwise), though it is for some people. But mostly, I just don't think this idea of chatting with an AI instead of reading things is very mainstream, maybe at least in part because it isn't very easy to get them to quote at length. What I don't know is whether this is likely to change or how quickly.
there is no reason to read any book or research publication because you can just ask an AI to describe and quote at length from the contents of anything you might want to read
I think this is the fundamental misunderstanding at the heart of a lot of the anger over this, beyond the basic "corporations in general are out of control and living authors should earn a fair wage" points that existed before this.
You summarize well how we aren't there yet, but I'd say the answer to your final implied question is "not likely to change at all". Even when my fellow traitors-to-humanity are done with our cognitive AGI systems that employ intuitive algorithms in symphony with deliberative symbolic ones, at the end of the day, information theory holds for them just as much as it does for us. LLMs are not built to memorize knowledge, they're built to intuitively transform text -- the only way to get verbatim copies of "anything you might want to read" is fundamentally to store a copy of it. Full stop, end of story, will never not be true.
In that light, such a future seems as easy to avoid today as it was 5 years ago: not trivial, but well within the bounds of our legal and social systems. If someone makes a bot with copies of recent literature, and the authors wrote that lit under a social contract that promised them royalties, then the obvious move is to stop them.
Until then, as you say: only extremists and laymen who don't know better are using LLMs to replace published literature altogether. Everyone else knows that the UX isn't there, and the chance for confident error way too high.
that was just a metaphor, you can ask your AI what's that or use way less energy and use Wikipedia's search engine... or do you think OpenAI first evaluates if the author is an independent developer &/or has died &/or was funded by a public university before adding the content to the training database? /s
and one thing is publishing a paper with jargon for academics, another is to simplify the results for the masses. there's a huge difference between finishing a paper and a book
It isn't that someone was hurt. We have one private entity gaining power by centralizing knowledge (which they never contributed to) and making people pay for regurgitating the distilled knowledge, for profit.
Few entities can do that (I can't).
Most people are forced to work for companies that sell their work to the higher bidder (which are the very entities mentioned above), or ask them to use AI (under the condition that such work is accessible to the AI entities).
It's obviously a vicious circle, if people can't oppose their work to be ingested and repackaged by a few AI giants.
Like supporting Android Open Source Project… until Google decides to move the critical parts into Google Play Services? I run GrapheneOS (love it) but almost no banks will allow non-Google-sponsored Android ROMs and variants to do NFC transactions because… the AOSP is designed to miss what Google actually needs.
Idem with ML Kit loaded by Play Services, which makes Android apps fail in many cases.
And I'm not talking about biases introduced by private entities that open source their models but pursue their own goals (e.g geopolitical).
As long as AI is designed and led by huuge private entities, you'll have a hard time benefiting from it without supporting the entities' very own goals.
The answer is to censor the model output, not the training input. A dumb filter using 20 year old technology can easily stop LLM's from verbatim copyright output.
I know that this seems likely from a theoretical perspective (in other words, I would way underestimate it at the sprint planning meeting!), but
A) checking each output against a regex representing a hundred years of literature would be expensive AF no matter how streamlined you make it, and
B) latent space allows for small deviations that would still get you in trouble but are very hard to catch without a truly latent wrapper (i.e. another LLM call). A good visual example of this is the coverage early on in the Disney v. ChatGPT lawsuit:
What if the model simply substitutes synonyms here and there without changing the spirit of the material? (This might not work for poetry, obviously.) It is not such a simple matter.
Personally, I strongly believe that the aesthetic skills of humanity are one of our most advanced faculties — we are nowhere close to replacing them with fully-automated output, AGI or no.