Hacker News new | past | comments | ask | show | jobs | submit login
[flagged] OpenAI Says It's "Over" If It Can't Steal All Your Copyrighted Work (futurism.com)
66 points by raju 43 days ago | hide | past | favorite | 79 comments



Yes, well, in a way they're right and I suspect everyone here knows it no matter how and mighty they might want act when commenting. When foreign (here 'Chinese') competition just ignores copyright laws while 'western' companies have to abide by them for every piece of data they use to train their models the former will have a clear advantage over the latter. This also happens to be how the USA acted in the 1800s [1]:

the United States declined an invitation to a pivotal conference in Berne in 1883, and did not sign the 1886 agreement of the Berne Convention which accorded national treatment to copyright holders. Moreover, until 1891 American statutes explicitly denied copyrights to citizens of other countries and the United States was notorious in the international sphere as a significant contributor to the "piracy" of foreign literary products. It has been claimed that American companies for the most part "indiscriminately reprinted books by foreign authors without even the pretence of acknowledgement" (Feather, 1994, 154). The tendency to freely reprint foreign works was encouraged by the existence of tariffs on imported books that ranged as high as 25 percent (see Dozer, 1949).

[1] http://socialsciences.scielo.org/scielo.php?script=sci_artte...


Plus in this case I don't even think it a copyright violation to analyze protected works. And even creating derivates of some form isn't, since that is the way any form of art works in the first place.

Sure, perhaps they would need a license to get the material, but I don't see how broken copyright laws will be of any help here.


The US has found ways to get China to adhere to IP laws, but they're never going to agree to restrictions the US doesn't impose on itself. Presuming that there is no way China will respect IP laws is BS.


What I don't understand is why this is always presented as a "race" that "we" have to win or else. It's just such a strange framing to me and every time I see it, it's presented as some sort of self-evident truth, but I don't think it's self-evident at all.


The "race" analogy is entirely driven by venture capital framing: they are interested in controlling the market usually done by getting to a certain dominant position within the overall space thereby crowding out new entrants and being able to direct where the market goes.

China's lead efforts on the other hand look to the long view - by releasing their products as open-source they can improve on each other's work. No one controls the market but there is constant competition and innovation.

All this is besides the point, however, for this article claims that OpenAI is using China as an excuse to have unfettered access to all copyrighted works through the fair use loophole.

So the crux is whether we believe in "innovation uber alles" or intellectual property rights.


Nature is a complex system. Many are in competition, it's not just humans. Most of these systems form a balance (see biodiversity). Due to resource scarcity, power tends to form which gives these power structures and advantage. Humans form these power structures arounds groups. This has been happening for as long as humans tribalised. Right now, humans can form these groups at nation-state level complexity and to some extent more global. This is humans current best effort. If you can do better, please do.


>>Nature is a complex system.

There is no winner take all in an ecosystem. If it happens, that ecosystem collapses !!

It is strange that you use this as an example yet fail to understand it fully.

Nature is a complex system ... with adaptive feedback. Every process is a cycle - has feedbacks that amplifies/regulates it. Yes there are apex predators, but there is no winner take all in an ecosystem. Living beings coexist.


No need to even complicate it to that degree. A wrong begets another wrong forever unless someone stops doing the next wrong thing. That's literally what it takes.


That doesn't answer the question.

It's a race with winners and losers because ego and money. Ego because … well, ego.

Money because whoever develops the most powerful AI and gets enough people to buy into it, will probably retain the top spot for quite a while because inertia (sort of like how Google got to be where it is).

I'm sure some level of paranoia feeds into it at some level. Whoever gets locked in the public's mindset will rule the world and if it's not a Silicon Valley magnate, then they are losers.


I mean, I do think we should want to win the race, the point is they want to keep all the money. You can literally just offer equity as compensation to "content providers" and we won't have any problems with liquidity issues on the development side, and people can still be compensated or opt out.

OpenAI doesn't want to do that.


> I do think we should want to win the race

Why, though? I can understand the companies involved wanting to be first in order to maximize their profit, but why should that matter to anybody else?


Uhh... because nations should care about their economic output? And want to maximize that output all things being equal?


It’s yet to be proven that AI will increase economic output. Some might argue that reducing the economic benefits for the many in order to maximize the output of the whole may backfire horribly over the long-term.


I'm talking about the firms that develop AI. The entire point of the conversation is "why should countries care about businesses being in their country as opposed to elsewhere." The point is obvious that where businesses operate is important to the economic output of a nation.


Sorry, but your attempt to reframe my response was not successful. If AI is harmful to society, the “winning” of the AI race may not be beneficial to that society.


It seems like the term "race" comes from "arms race".

Perhaps the future of Silicon Valley is to be the home of defense contractors.

https://watson.brown.edu/costsofwar/papers/2024/SiliconValle...


It seems that most people on this site believe that this is a good thing, but all this restriction would mean is that for the next while - the only companies able to afford mass licensing would be in the SPY 500, and that's assuming these companies wouldn't just flock to a nation outside of Americas influence.

At some point, it becomes a national security issue. This technology is going to be leveraged in ways we can't even dream up today. Copyright law needs to be re-imagined in a way that won't restrict advancement in AI, and AI-adjacent technology. It's not because we want to - it's because we have to.


It's not that hard. So if you want to ask questions or work with a Stephen King book, you have to rent it during your LLM session. OpenAi would make a small fee, the author would get the majority, and the user gets value. You don't have to be a billion-dollar company to set up a monetization structure like that. Startups could do this if they negotiate with authors.

For general questions, you can use the free wiki that's ingested into the LLM or pay a fee for general content like current events.

You keep the LLM free in the third-world out of necessity. OpenAI, in the first world, cannot ask to be treated as if it were a third-world company because we are too rich to be that ridiculous.


When Roger Bacon discovered what Gunpowder was capable of, he kept it to himself - he thought that once the poor knew how to make gunpowder, the poor would make weapons to destroy them.

We cannot let that happen with AI technology, and it is a very difficult conversation when we're talking about technology that has already replaced likely hundreds of thousands of jobs in the form of extending the amount of productivity individuals can produce.

To you, this is a moral issue, and one I absolutely agree with at its core. But this is technology, in my opinion, has the risk of eventually triggering a form of social stratification. The focus should be on keeping the technology ubiquitous, accessible, and unrestricted.


> But this is technology, in my opinion, has the risk of eventually triggering a form of social stratification. The focus should be on keeping the technology ubiquitous, accessible, and unrestricted.

But this is exactly what proposals like you’re responding to are trying to do. Ignoring the morality this is an economic issue. Massive economic value is potentially going to be created by stealing from individuals. Why can’t they get small kickbacks? Why must their contribution be completely devoid of remuneration for us to stand a chance of “winning a war” or keeping this technology accessible?


You're right. If there are methods to get creators paid, while ensuring unfettered access to all - it absolutely should happen. The legal system in America doesn't have a good track record of nuance, especially when nuance is necessary. My views come from the idea that the American legal system will either smite them into bankruptcy, or it will give them the precedent they need to exempt past violations, and carry on as usual.

Nuance is needed, and I hope that they find it.


Nuance is needed

So much so that your first reply froze me and made me think. This is not easy, it's absolutely in our nature to gate keep knowledge.


These comments made me realize my viewpoints surrounding this issue are heavily based on the American legal system being very binary, with the majority of tech companies going all or nothing. Appeal your way up to the supreme court, and pray for the all.

In this case, it feels like the two most likely outcomes both hurt us.


It isn't in our nature at all, on the contrary. It is if that knowledge is useful for strategic purpose like economic advantages, but it is an exception.


Why? Why do we have to? Why do companies get to take the creative output of humanity for free to make a profit?

Why is it a national security issue? Because people who could make billions of dollars say so?


At some point, we have to look at this pragmatically. To me, it's not about FAANGs getting over on the every man, it's about making sure we maintain the opportunity of playing on the same field, with the same resources.


But why must it be free? Immense amounts of money are being thrown around and at the first suggestion that maybe the thing that underpins their work should be paid for they say it’s infeasible. If you listen to Altman the future is going to be infinite. Why can’t we pay authors for their books in that case?


You're assuming this leads...somewhere. Currently, AI is not all that useful. And progress seems to be slowing, not accelerating.


You may not have found ways to make AI work for you in your workflows, but millions of others have. It's not perfect, but it's useful to everyone I know that has made a meaningful attempt to experiment, discarding the bad, and integrating the good.


Can you cite examples of these real world use cases that _millions_ of people have integrated in their workflows successfully on a day to day basis?


I call XY on this. The problem is inherit in LLMs and the solution is something else altogether, not just allowing companies to ignore the law and lobby for changing said law after the fact.


It sounds like government continuing to honor the property rights of everyone is getting in the way of a handful of rich people's desire to take all that value for themselves.


By this logic Google Search couldn't exist. Except that Google won those cases.


How is Google search breaking copywrite?


The text preview underneath the search result is one thing I remember being contentious (some news websites in France took Google to court and won if I recall correctly)

For mostly the same reasons people are against AI. If you read that text, sometimes there’s no reason to visit the website, which ‘deprives’ website owners and the content creators of ad revenue they would have gotten if Google hadn’t copied the text from their website.

After all, the news is the same regardless of if it’s written on the Google result preview or on the news website itself.


Exact same way OpenAI does: by scraping data, ingesting it, processing it, incorporating it into its proprietary system, and using it to serve responses to queries.

This is not to say that I thing that any of this is wrong. I think that if what Google or OpenAI do is illegal, then the law is wrong, not Google or OpenAI.


Search engines are an index to a first approximation. I realize they sometimes hoist a thesis sentence or two up to their page, but it feels substantially different than automating derivative works for sale.


what is the substantive difference between googling "filter records in dataframe by regular expression in Julia" vs asking ChatGPT/Grok/Claude the same thing (to name a totally non-random example I've been working with lately)?


Not much difference for uncopyrightable facts, but ask for a portrait, report, or whole program, and you get closer to a problem. There is precedent around derivatives called "fair use," that can maybe give some guidance.

Oracle lost the API copyright case, which might also be an important precedent.


ChatGPT is a substitute to the original work, Google redirects to the original work.

(and yes,the synthesis on the top of the page is a problem, I agree)


Last I checked Google is not buying or pirating books for Google Search they just grab free data that has been provided.


What do you mean by “provided”, exactly? Just because something can be accessed via HTTP GET request doesn’t mean it’s legal to fetch it, and does not give you an implicit license to do with it whatever you want. Google, in fact, will happily, scrape, index, and serve queries from PDFs of illegally pirated copyrighted books.


You might want to check again: https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....

> For works still under copyright, Google scanned and entered the whole work into their searchable database, but only provided "snippet views" of the scanned pages in search results to users.


The same way we can't process what a trillion dollars looks like, we can't actually process what large scale theft looks like. For shits and giggles, these people also have a trillion dollars.


Is it the exact same though?


Sorry, do you have actual point, or are just trying to be pedantic? Strictly speaking, it’s not exact same, because no two different things are exact same, by definition. However, my point is that the same principle should apply to both.


No, he just means to say they are very different and if you disagree, you should think about it again.


I don’t think it’s even that close to the same thing tbh.


Even before they added the AI preview, they would take sentences that looked like answers to your query and display them at the top of the page verbatim.


So basically, we know China is never going to pay the publishers/content creators (never). If we hold our principles to OpenAI (pay who you took from), they will go bankrupt. So of course they are speaking in end-game language. To suggest the race is lost even before it starts is an incredible thing.

How is it that we can theorize that the model would get better with more data, but we can't theorize that the business model would need to get bigger (pay the content creators) to train the model? Shoot first and ask questions later (or rather, BEG later).


You know, there's a creative third way which the US could approach if it had the cajones.

Allow OpenAI and other AI companies to use all data for training, but require that they pay it forward by charging royalties on profits beyond X amount of profit, where X is a number high enough to imply true AGI was reached.

The royalties could go into a fund that would be paid out like social security payments for every American starting when they were 18 years old. Companies could likewise request a one time deferred payment or something like that.

It's having your cake and eating it. Also helping ease some tensions around job loss.

Sadly, what we'll likely get is a bunch of tech leaders stumbling into wild riches, hoarding it, and then having it taken from them by force after they become complacent and drunk on power without the necessary understanding of human nature or history to see why they've brought it on themselves.


There are many possibilities. Perhaps they're allowed to use anything publicly accessible but have to release their model every x amount of time, which might be a month or a year. My biggest fear is that as happened with copyright's limited term, this limited term would get chipped away at over the years.

Another would be that they couldn't sell access to customers directly but rather must license it out to various entities at rates set by regulators. Those entities then would compete with each other for end customers. This of course might be prone to regulatory capture like happens with utilities.


Not to be funny on purpose, but we are having discussions in America currently on if we should finance aid for poverty and the like. I love your idea though.


> So basically, we know China is never going to pay the publishers/content creators (never)

Who is we? How do you know? Never is a strong word.

> If we hold our principles to OpenAI (pay who you took from), they will go bankrupt.

i.e. their business wasn't feasible to begin with? Sounds fine? What's wrong with them being bankrupt (if needed).


OpenAI has become "too big to fail". They obviously can't face any meaningful repercussions, as that goes against the established form of capitalism in the US. Instead, they have to find creative ways to allow (or at least not sentence) OpenAI to any wrongdoing. Shareholders über alles.


So, does that mean that openai's models will be opensource then ? I mean, if it's built on our collective intellectual property, its only fair we have free access to it.


I think we just need to rethink copyright for language models. I'm okay just licensing 1 copy of a work to any LLM model throughout its various generations. Just don't pirate it if no special license is available, buying the ebook should suffice. It should be no different from a human buying a copy. The rule should only be that it does not leak the entire work.


I'm not OK with that, though... and here we have the nut of the problem. There is no agreement as to what's acceptable and what's not.

I personally think that the odds of me me being able to both publicly publish my words and code and be able to keep them out of training data is pretty close to zero. Since that's unacceptable to me, my only option is not to publish that stuff at all.


It's always interesting to see how the title of a HN post radically changes the people who comment and vote. The AI friendly people are being carpet bombed by haters, but in a model release thread the haters would be flagged to oblivion.


“Haters” is nothing more than a thought-terminating cliche.



The product requires crime? I feel like most products do not require crime. This is not a good sales pitch.


The product doesn't require crime, but the massive profitability of their business model requires it.

And the red herring that "China will steal it if we don't do it first".


Either that, or copyright law is bad in its current form and LLM’s are yet an example of what exposes that.

Even if copyright owners can’t point to how much damage, if any, they suffer from AI, it’s seen as wrong and bad. I think it’s getting boring to hear that story about copyright repeat itself. In most crimes, you need to be able point to a damage that was done to you.

Also, while there are edge cases in some LLM’s where you can make them spew some verbatim training material, often through jailbreaks or whatnot, an LLM is a destructive process involving ”fuzzy logic” where the content is generally not perfectly memorized, and seems no more of a threat to copyright than recording broadcasts onto cassette tapes or VHS were back in the day. You’d be insane to use that stuff as a source of truth on par with the original article etc.


More like, it's interesting that big tech companies can create extremely elaborate copyright assignment, metering and payout mechanisms when it's in their interest - right down to figuring out who owns 30 seconds of incidental radio music that plays in the background during someone's speedrun video.

But for other classes of user generated content, the problem is suddenly "impossible".


Existing law is ruinous to my murder-for-hire business. We need change.


Can someone please vouch for this thread and unflag it? It's kind of the main tech issue of our time ...


Some good news for a change!


something tells me that this pathetic messaging approach is not going to be the one that squares the circle between "piracy is illegal" and "information wants to be free"


Sorry but it is actually a huge problem for the US if the DeepSeek models are able to train on sorta-illegal dumps of scientific papers and US models aren't. The ones that are paywalled by scientific journals.

Everyone WILL start using hosted frontier Chinese models if they are demonstrably better at answering scientific questions than ChatGPT, sending essentially all US research questions into a Chinese data dump. This is even worse than the national security catastrophe that is TikTok (even aside from the EVEN BIGGER issue that China will have models that are staggeringly better than those in the US, because they are up to date on the science).

I understand the reflexivity against AI companies "stealing content" but we need to stay competitive and figure out the financial compensation later. This is not a case where our unbelievably generous copyright laws should take precedence over US competitiveness.


Why is this flagged?


You have to remember a company is not a social being with balanced obligations. Its obligation is to its owners and not to society.

If OpenAI’s leadership weren’t saying precisely this, they wouldn’t be doing their jobs.


>> Its obligation is to its owners and not to society.

This isn't true at all. It has an obligation to follow the law of the society it operates in, even if that results in lower profits.


I agree with you. It has to follow the rules.

Unfortunately the society this company operates in is highly Machiavellian and can’t improve because people are too busy hating each other and also the rules it does have aren’t being enforced very well and finally, this type of lobbying is part of the culture in the US, it’s so expected it’d be weird if they didn’t do it.


Yeah, we have a wrong conception. It's fine, society often has wrong conceptions. We are just dead wrong about ruthless capitalism. A company is a custodian of a good society, it has responsibilities that far exceed profit.


Copyright infrigement is not stealing[0]. The person still has what they made. Not sure why they propigate it as theft. Seems like a pro copyright propaganda extremisit article which goes significantly against progress of advancements for arts and sciences.

[0]https://en.m.wikipedia.org/wiki/Dowling_v._United_States_(19...


> against progress of advancements for arts and sciences.

No it’s always against commercialization. That’s why we have exceptions like political commentary, satire and in particular arts and sciences. The issue is about making money from someone else’s work.

You can still disagree with it of course, but let’s have an honest discussion.


If that was the case, then cooking recipes would be all copyrighted, and you would have to pay a licensing fee every time you made spaghetti. The complexity of copyright is even referenced in the Wikipedia supreme court ruling I linked too. Yet you ask for an honest discussion? What was not forthright in my reply?


A mere listing of ingredients or contents, or a simple set of directions, is uncopyrightable.

https://www.copyright.gov/circs/circ33.pdf

> ... aa recipe that creatively explains or depicts how or why to perform a particular activity may be copyrightable. A registration for a recipe may cover the written description or explanation of a process that appears in the work, as well as any photographs or illustrations that are owned by the applicant. However, the registration will not cover the list of ingredients that appear in each recipe, the underlying process for making the dish, or the resulting dish itself.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: