Hacker News new | past | comments | ask | show | jobs | submit | more hydrolox's comments login

I think it's mainly a millennial and gen z thing-- older generations still answer all calls, at least those that aren't into tech. I think it's just easier to realize that anyone not in your contacts will either leave a voicemail or text you if it's that important.


I'm mid Gen X, and I can't imagine wasting my time answering all calls. I have my phone set to silence any unknown numbers. I'm not going to answer any call that isn't in my contact list. Voicemail a coherent message and I'll call you back and add you to my contacts.


We ignore it too. But I can tell the ones who do answer. They get extremely irate if you do not pick up when they call. As if it is their personal line to you and you should drop everything for them. I dump them into voicemail too.


Iready was horrible



It's still IQ test, at least when I was in the program ~5-10 years ago. To be honest, though, there was still a clique of nerdy kids and "the rest" even within the program (which for me was 2 separate classes of kids, so for each grade in the whole school there were 2 classes worth of gifted kids.)


I understand that regulations exist and how there can be copyright violations, but shouldn't we be concerned that other.. more lenient governments (mainly China) who are opposed to the US will use this to get ahead? If OpenAI is significantly set back.


No. OpenAI is suspected to be worth over $150B. They can absolutely afford to pay people for data.

Edit: People commenting need to understand that $150B is the discounted value of future revenues. So... yes they can pay out... yes they will be worth less... and yes that's fair to the people who created the information.

I can't believe there are so many apologists on HN for what amounts to vacuuming up peoples data for financial gain.


The OpenAI that is assumed to keep being able to harvest every form of IP without compensation is valued at $150B, an OpenAI that has to pay for data would be worth significantly less. They're currently not even expecting to turn a profit until 2029, and that's without paying for data.

https://finance.yahoo.com/news/report-reveals-openais-44-bil...


OpenAI is not profitable, and to achieve what they have achieved they had to scrape basically the entire internet. I don't have a hard time believing that OpenAI could not exist if they had to respect copyright.

https://www.cnbc.com/2024/09/27/openai-sees-5-billion-loss-t...


That's a good thing! If a company cannot raise to fame unless they violate laws, it should not have been there.

There is plenty of public ___domain text that could have taught a LLM English.


I'm not convinced that the economic harm to content creators is greater than the productivity gains and accessibility of knowledge for users (relative to how competent it would be if trained just on public ___domain text). Personally, I derive immense value from ChatGPT / Claude. It's borderline life changing for me.

As time goes on, I imagine that it'll increasingly be the case that these LLM's will displace people out of their jobs / careers. I don't know whether the harm done will be greater than the benefit to society. I'm sure the answer will depend on who it is that you ask.

> That's a good thing! If a company cannot raise to fame unless they violate laws, it should not have been there.

Obviously given what I wrote above, I'd consider it a bad thing if LLM tech severely regressed due to copyright law. Laws are not inherently good or bad. I think you can make a good argument that this tech will be a net negative for society, but I don't think it's valid to do so just on the basis that it is breaking the law as it is today.


> I'm not convinced that the economic harm to content creators is greater than the productivity gains and accessibility of knowledge for users (relative to how competent it would be if trained just on public ___domain text).

Good thing whether or not something is a copyright violation doesn't depend on if you can make more money with someone else's work than they can.


I understand the anger about large tech companies using others work without compensation, especially when both they and their users benefit financially. But this goes beyond economcis. LLM tech could accelerate advances in medicine and technology. I strongly believe that we're going to see societal benefits in education, healthcare, especially mental health support thanks to this tech.

I also think that someone making money off LLM's is a separate question from whether or not the original creator has been harmed. I think many creators are going to benefit from better tools, and we'll likely see new forms of creation become viable.

We already recognize that certain uses of intellectual property should be permitted for societies benefit. We have fair use doctrine, patent compulsory licensing for public health, research exmpetions, and public libraries. Transformative use is also permitted, and LLMs are inherently transformative. Look at the volume of data that they ingest compared to the final size of a trained model, and how fundamentally different the output format is from the input data.

Human progress has always built upon existing knowledge. Consider how both Darwin and Wallace independently developed evolution theory at roughly the same time -- not from isolation, but from building on the intellectual foundation of their era. Everything in human culture builds on what came before.

That all being said, I'm also sure that this tech is going to negative impact people too. Like I said in the other reply, whether or not this tech is good or bad will depend on who you ask. I just think that we should weigh these costs against the potential benefits to society as a whole rather than simply preserving existing systems, or blindly following the law as if the law is inherently just or good. Copyright law was made before this tech was even imagined, and it seems fair to now evaluate whether the current copyright regime makes sense if it turns out that it'd keep us in some local maximum.


> unless they violate laws

*unless they violate country laws.

Which means openAI or its alternative could survive in China but not in US. The question is that if we are fine with it?


technically open ai has respected copyright, except in the (few) instances they produce non-fair-use amounts of copyrighted material.

dmca does not cover scraping.


That's not real money tough. You need actual cash on hand to pay for stuff, OpenAI only have the money they've been given by investors. I suspect that many of the investors wouldn't have been so keen if they knew that OpenAI would need an additional couple of billions a year to pay for data.


That's too bad that your business isn't viable without the largest single violation of copyright of all time.


That doesn’t mean they have $150B to hand over. What you can cite is the $10 billion they got from Microsoft.

I’m sure they could use a chunk of that to buy competitive I.P. for both companies to use for training. They can also pay experts to create it. They could even sell that to others for use in smaller models to finance creating or buying even more I.P. for their models.


[flagged]


We can, and do, choose to treat normal people different from billion dollar companies that are attempting to suck up all human output and turn it into their own personal profit.

If they were, say, a charity doing this for the good of mankind, I’d have more sympathy. Shame they never were.


The way to treat them differently is not by making them share profits with another corporation. The logical endgame of all this isn’t “stopping LLMs,” it’s Disney happening to own a critical mass of IP to be able to legally train and run LLMs that make movies, firing all their employees, and no smaller company ever having a chance in hell with competing with a literal century’s worth of IP powering a generative model.

The best party about all this is that Disney initially took off by… making use of public ___domain works. Copyright used to last 14 years. You’d be able to create derivative works of most the art in your life at some point. Now you’re never allowed to. And more often than not, not to grant a monopoly to the “author”, but to the corporation that hired them. The correct analysis shouldn’t be OpenAI vs. Intercept or Disney of whomever. You’re just choosing kings at that point.


> produced "a unique" song?

People do get sued for making songs that are too similar to previously made songs. One defence available is that they've never heard it themselves before.

If you want to treat AI like humans then if AI output is similar enough to copyrighted material it should get sued. Then you try to prove that it didn't ingest the original version somehow.


The fact that these lawsuits aren't as simple as "is my copywrited work in your training set, yes or no" is boggling.


I feel like at some point the people in favor of this are going to realize that whether the data was ingested into a training set is completely immaterial to the fact that these companies downloaded data they don't have a license to use to a company server somewhere with the intention to use it for commercial use.


Ah yes, humans and LLMs are exactly the same, learning the same way, reasoning the same way, they're practically indistinguishable. So that's why it makes sense to equate humans reading books with computer programs ingesting and processing the equivalent of billions of books in literal days or months.


While I agree with your sentiment in general, this thread is about the legal situation and your argument is unfortunately not a legal one.


“A person is fundamentally different from an LLM” does not need a legal argument and is implied by the fact that LLMs do not have human rights, or even anything comparable to animal rights.

A legal argument would be needed to argue the other way. This argument would imply granting LLMs some degree of human rights, which the very industry profiting from these copyright violations will never let happen for obvious reasons.


The other problem with the legal argument that it's "just like a person learning" is that corporations whose human employees have learned what copyrighted characters look like and then start incorporating them into their art are considered guilty of copyright violation, and don't get to deploy the "it's not an intentional copyright violation from someone who should have known better, it's just a tool outputting what the user requested" defence...


Exactly.

Also, it is only a matter of time until one of those employees (thanks to free will and agency) will whistleblow, it doesn’t scale, etc.

Frankly, the fact that such a big segment of HN crowd unthinkingly buys big tech’s double standard (LLMs are human when copyright is concerned, but not human in every other sense) makes me ashamed of the industry.


The process of reading it into their training data is a way of copying it. It exists somewhere and they need to copy it in order to ingest it.


By that logic you're violating copyright by using a web browser.


>By that logic you're violating copyright by using a web browser.

You would be except for the fact that publishing stuff on the web gives people an implicit license to download it for the purposes of viewing it.


Not sure about US or other jurisdictions, but that's not how any of this works in Germany. In Germany downloading anything from anywhere (even a movie) is never illegal and does not require a license. What's illegal is publishing/disseminating copyrighted content without authorization. BitTorrenting a movie is illegal because you're distributing it to other torrenters. Streaming a movie on your website is illegal because it's public. You can be held liable for using a photo from the web to illustrate your eBay auction, not because you downloaded it but because you republished it.

OpenAI (and Google and everyone else) is creating a publicly-accessible system that produces output that could be derived from copyrighted material.


I think it works like that in Canada and some other places too, because they pay an extra tax on storage media when they buy it, which essentially authorizes a license for any copyrighted material that might be stored on that media.


> In Germany […]

That‘s confidently and completely wrong.


I'm only allowed to view it? I can't download it, convert each word into a color, and create a weird piece of art work out of it? I think I can.


>convert each word into a color, and create a weird piece of art work out of it? I think I can.

I agree, but the original author might get butthurt if you distribute it. Realistically copyright law in the US is a mess when it comes to weird pieces of art.


The nature of the copy does actually matter.


> You read books and now you have a job? Pay up.

It is disingenuous to imply the scale of someone buying books and reading them (for which the publisher and author are compensated) or borrowing them from the library and reading them (again, for which the publisher and author are compensated) is the same as the wholesale copying without permission or payment of anything not behind a pay wall on the Internet.


Isn't it a greater risk that creators lose their income and nobody is creating the content anymore?

Take for instance what has happened with news because of the internet. Not exactly the same, but similar forces at work. It turned into a race to the bottom with everyone trying to generate content as cheaply as possible to get maximum engagement with tech companies siphoning revenue. Expensive, investigative pieces from educated journalists disappeared in favor of stuff that looks like spam. Pre-Internet news was higher quality

Imagine that same effect happening for all content? Art, writing, academic pieces. Its a real risk that openai has peaked in quality


Lots of people create without getting paid to do it. A lot of music and art is unprofitable. In fact, you could argue that when the mainstream media companies got completely captured by suits with no interest in the things their companies invested in, that was when creativity died and we got consigned to genre-box superhero pop hell.


> Isn't it a greater risk that creators lose their income and nobody is creating the content anymore?

There are already multiple lifetimes of quality content out there. It's difficult to get worked up about the potential future losses.


I don’t know. When I look at news from before, there never was investigative journalism. It was all opinion swaying editos, until alternate voices voiced their counternarratives. It’s just not in newspapers because they are too politically biased to produce the two sides of stories that we’ve always asked them to do. It’s on other media.

But investigative journalism has not disappeared. If anything, it has grown.


its changed. Investigative journalism is done by non-profits specializing in it, who have various financial motives.

The budgets at newspapers used to be much larger and fund more investigative journalism with a clearer motive.


This type of argument is ignorant, cowardly, shortsighted, and regressive. Both technology and society will progress when we find a formula that is sustainable and incentivizes everyone involved to maximize their contributions without it all blowing up in our faces someday. Copyright law is far from perfect, but it protects artists who want to try and make a living from their work, and it incentivizes creativity that places without such protections usually end up just imitating.

When we find that sustainable framework for AI, China or <insert-boogeyman-here> will just end up imitating it. Idk what harms you're imagining might come from that ("get ahead" is too vague to mean anything), but I just want to point out that that isn't how you become a leader in anything. Even worse, if they are the ones who find that formula first while we take shortcuts to "get ahead", then we will be the ones doing the imitation in the end.


Copyright is a dead man walking and that's a good thing. Let's applaud the end of a temporary unnatural state of affairs.


If OpenAI wants copyright to be dead, then they could just give out all their models copyright free.


Care to make it interesting?

What do you consider "dead" and what do you consider a reasonable timeframe for this to occur?

I have 60 or so years and $50.


I am in as well, I have 50 or so years and $60 (though would gladly put $600k on this… :) )


Should we also be concerned that other governments use slave labor (among other human rights violations) and will use that to get ahead?


It's hysterical to compare training an ML model with slave labour. It's perfectly fine and accepted for a human to read and learn from content online without paying anything to the author when that content has been made available online for free, it's absurd to assert that it somehow becomes a human rights violation when the learning is done by a non-biological brain instead.


> It's hysterical to compare training an ML model with slave labour.

Nobody did that.

> It's perfectly fine and accepted for a human to read and learn from content online without paying anything to the author when that content has been made available online for free, it's absurd to assert that it somehow becomes a human rights violation when the learning is done by a non-biological brain instead.

It makes sense. There is always scale to consider in these things.


worble literally did make that comparison. It is possible for comparisons to be made using other rhetorical devices than just saying "I am comparing a to b".


> worble literally did make that comparison

No, their mention of "slave labor" is not a comparison to how LLMs work, nor an assertion of moral equivalence.

Instead it is just one example to demonstrate that chasing economic/geopolitical competitiveness is not a carte blanche to adopt practices that might be immoral or unjust.


Absolutely: if copyright is slowing down innovation, we should abolish copyright.

Not just turn a blind eye when it's the right people doing it. They don't even have a legal exemption passed by Congress - they're just straight-up breaking the law and getting away with it. Which is how America works, I suppose.


Exactly. They rushed to violate copyright on a massive scale quickly, and now are making the argument that it shouldn't apply to them and they couldn't possibly operate in compliance with it. As long as humans don't get to ignore copyright, AI shouldn't either.


Humans do get to ignore copyright, when they do the same thing OpenAI has been doing.


Exactly.

Should I be paying a proportion of my salary to all the copyright holders of the books, song, TV shows and movies I consumed during my life?

If a Hollywood writer says she "learnt a lot about writing by watching the Simpsons" will Fox have an additional claim on her earnings?


> Should I be paying a proportion of my salary to all the copyright holders of the books, song, TV shows and movies I consumed during my life?

you already are.

a proportion of what you pay for books, music, tv shows, movies goes to rights holders already.

any subscription to spotify/apple music/netflix/hbo; any book/LP/CD/DVD/VHS; any purchased digital download … a portion of that sales is paid back to rights holders.

so… i’m not entirely sure what your comment is trying to argue for.

are you arguing that you should get paid a rebate for your salary that’s already been spent on copyright payments to rights holders?

> If a Hollywood writer says she "learnt a lot about writing by watching the Simpsons" will Fox have an additional claim on her earnings?

no. that’s not how copyright functions.

the actual episodes of the simpsons are the copyrighted work.

broadcasting/allowing purchases of those episode incurs the copyright as it involves COPYING the material itself.

COPYright is about the rights of the rights holder when their work is COPIED, where a “work” is the material which the copyright applies to.

merely mentioning the existence of a tv show involves zero copying of a registered work.

being inspired by another TV show to go off and write your own tv show involves zero copying of the work.

a hollywood writer rebroadcasting a simpsons during a TV interview would be a different matter. same with the hollywood writer just taking scenes from a simpsons episode and putting it into their film. that’s COPYing the material.

—-

when it comes to open AI, obviously this is a legal gray area until courts start ruling.

but the accusations are that OpenAi COPIED the intercept’s works by downloading them.

openAi transferred the work to openAi servers. they made a copy. and now openAi are profiting from that copy of the work that they took, without any permission or remuneration for the rights holder of the copyrighted work.

essentially, openAI did what you’re claiming is the status quo for you… but it’s not the status quo for you.

so yeah, your comment confuses me. hopefully you’re being sarcastic and it’s just gone completely over my head.


The problem is the anti-AI people who complain about AI are going for several steps in the chain (and often they are vague about which ones they are talking about at any point).

As well as the "copying" of content some are also claiming that the output of a LLM should result in paying royalties back to the owning of the material used in training.

So if an AI produces a sitcom script then the copyright holders of those tv shows it ingested should get paid royalties. In additional to the money paid to copy files around.

Which leads to the precedent that if a writer creates a sitcom then the copyright holders of sitcoms she watched should get paid for "training" her.


When humans learn and copy too closely we call that plagiarism. If an LLM does it how should we deal with that?


> If an LLM does it how should we deal with that?

why not deal with it the same way as humans have been dealt with in the past?

If you copied an art piece using photoshop, you would've violated copyright. Photoshop (and adobe) itself never committed copyright violations.

Somehow, if you swap photoshop with openAI and chatGPT, then people claim that the actual application itself is a copyright violation.


this isn’t the same.

> If you copied an art piece using photoshop, you would've violated copyright. Photoshop (and adobe) itself never committed copyright violations.

the COPYing is happening on your local machine with non-cloud versions of Photoshop.

you are making a copy, using a tool, and then distributing that copy.

in music royalty terms, the making a copy is the Mechanical right, while distributing the copy is the Performing right.

and you are liable in this case.

> Somehow, if you swap photoshop with openAI and chatGPT, then people claim that the actual application itself is a copyright violation

OpenAI make a copy of the original works to create training data.

when the original works are reproduced verbatim (memorisation in LLMs is a thing), then that is the copyrighted work being distributed.

mechanical and performing rights, again.

but the twist is that ChatGPT does the copying on their servers and delivers it to your device.

they are creating a new copy and distributing that copy.

which makes them liable.

you are right that “ChatGPT” is just a tool.

however, the interesting legal grey area with this is — are ChatGPT model weights an encoded copy of the copyrighted works?

that’s where the conversation about the tool itself being a copyright violation comes in.

photoshop provides no mechanism to recite The Art Of War out of the box. an LLM could be trained to do so (like, it’s a hypothetical example but hopefully you get the point).


> OpenAI make a copy of the original works to create training data.

if a user is allowed to download said copy to view on their browser, why isn't that same right given to openAI to download a copy to view for them? What openAI chooses to do with the viewed information is up to them - such as distilling summary statistics, or whatever.

> are ChatGPT model weights an encoded copy of the copyrighted works? that is indeed the most interesting legal gray area. I personally believe that it is not. The information distilled from those works do not constitute any copyrightable information, as it is not literary, but informational.

It's irrelevant that you could recover the original works from these weights - you could recover the same original works from the digits of pi!


heads up: you may want to edit your second quote

> if a user is allowed to download said copy to view on their browser, why isn't that same right given to openAI to download a copy to view for them?

whether you can download a copy from your browser doesn’t matter. whether the work is registered as copyrighted does (and following on from that, who is distributing the work - aka allowing you to download the copy - and for what purposes).

from the article (on phone cba to grab a quote) it makes clear that the Intercept’s works were not registered as copyrighted works with whatever the name of the US copyright office was.

ergo, those works are not copyrighted and, yes, they essentially are public ___domain and no remuneration is required …

(they cannot remove DMCA attribution information when distributing copies of the works though, which is what the case is now about.)

but for all the other registered works that OpenAI has downloaded, creating their copy, used in training data, which the model then reproduces as a memorised copy — that is copyright infringement.

like, in case it’s not clear, i’ve been responding to what people are saying about copyright specifically. not this specific case.

> The information distilled from those works do not constitute any copyrightable information, as it is not literary, but informational.

that’s one argument.

my argument would be it is a form of compression/decompression when the model weights result in memorised (read: overfitted) training data being regurgitated verbatim.

put the specific prompt in, you get the decompressed copy out the other end.

it’s like a zip file you download with a new album of music. except, in this case, instead of double clicking on the file you have to type in a prompt to get the decompressed audio files (or text in LLM case)

> It's irrelevant that you could recover the original works from these weights - you could recover the same original works from the digits of pi!

actually, that’s the whole point of courts ruling on this.

the boundaries of what is considered reproduction is at question. it is up to the courts to decide on the red lines (probably blurry gray areas for a while).

if i specifically ask a model to reproduce an exact song… is that different to the model doing it accidentally?

i don’t think so. but a court might see it differently.

as someone who worked in music copyright, is a musician, sees the effects of people stealing musicians efforts all the time, i hope the little guys come out of this on top.

sadly, they usually don’t.


i’ve been avoiding replying to your comment for a bit, and now i realised why.

edit: i am so sorry about the wall of text.

> some are also claiming that the output of a LLM should result in paying royalties back to the owning of the material used in training.

> So if an AI produces a sitcom script then the copyright holders of those tv shows it ingested should get paid royalties. In additional to the money paid to copy files around.

what you’re talking about here is the concept of “derivative works” made from other, source works.

this is subtly different to reproduction of a work.

see the last half of this comment for my thoughts on what the interesting thing courts need to work out regarding verbatim reproduction https://news.ycombinator.com/item?id=42282003

in the derivative works case, it’s slightly different.

sampling in music is the best example i’ve got for this.

if i take four popular songs, cut 10 seconds of each, and then join each of the bits together to create a new track — that is a new, derivative work.

but i have not sufficiently modified the source works. they are clearly recognisable. i am just using copyrighted material in a really obvious way. the core of my “new” work is actually just four reproductions of the work of other people.

in that case — that derivative work, under music copyright law, requires the original copyright rights holders to be paid for all usage and copying of their works.

basically, a royalty split gets agreed, or there’s a court case. and then there’s a royalty split anyway (probably some damages too).

in my case, when i make music with samples, i make sure i mangle and process those samples until the source work is no longer recognisable. i’ve legit made it part of my workflow.

it’s no longer the original copyrighted work. it’s something completely new and fully unrecognisable.

the issue with LLMs, not just ChatGpt, is that they will reproduce both verbatim and recognisably similar output to original source works.

the original source copyrighted work is clearly recognisable, even if not an exact verbatim copy.

and that’s what you’ve probably seen folks talking about, at least it sounds like it to me.

> Which leads to the precedent that if a writer creates a sitcom then the copyright holders of sitcoms she watched should get paid for "training" her.

robin thicke “blurred lines” —

* https://en.m.wikipedia.org/wiki/Pharrell_Williams_v._Bridgep...

* https://en.m.wikipedia.org/wiki/Blurred_Lines (scroll down)

yes, there is already some very limited precedent, at least for a narrow specific case involving sheet music in the US.

the TL;DR IANAL version of the question at hand in the case was “did the defendants write the song with the intention of replicating a hook from the plaintiff’s work”.

the jury decided, yes they did.

this is different to your example in that they specifically went out to replicate the that specific musical component of a song.

in your example, you’re talking about someone having “watched” a thing one time and then having to pay royalties to those people as a result.

that’s more akin to “being inspired” by, and is protected under US law i think IANAL. it came up in blurred lines, but, well, yeah. https://en.m.wikipedia.org/wiki/Idea%E2%80%93expression_dist...

again, the red line of infringement / not infringement is ultimately up to the courts to rule on.

anyway, this is very different to what openAi/chatGpt is doing.

openAi takes the works. chatgpt edits them according to user requests (feed forward through the model). then the output is distributed to the user. and that output could be considered to be a derivative work (see massive amount of text i wrote above, i’m sorry).

LLMs aren’t sitting there going “i feel like recreating a marvin gaye song”. it takes data, encodes/decodes it, then produces an output. it is a mechanical process, not a creative one. there’s no ideas here. no inspiration or expression.

an LLM is not a human being. it is a tool, which creates outputs that are often strikingly similar to source copyrighted works.

their users might be specifically asking to replicate songs though. in which case, openAi could be facilitating copyright infringement (wether through derivative works or not).

and that’s an interesting legal question by itself. are they facilitating the production of derivative works through the copying of copyrighted source works?

i would say they are. and, in some cases, the derivative works are obviously derived.


>a proportion of what you pay for books, music, tv shows, movies goes to rights holders already.

When I borrow a book from a friend, how do the original authors get paid for that?


they don’t.

borrowing a book is not creating a COPY of the book. you are not taking the pages, reproducing all of the text on those pages, and then giving that reproduction to your friend.

that is what a COPY is. borrowing the book is not a COPY. you’re just giving them the thing you already bought. it is a transfer of ownership, albeit temporarily, not a copy.

if you were copying the files from a digitally downloaded album of music and giving those new copies to your friend (music royalties were my specialty) then technically you would be in breach of copyright. you have copied the works.

but because it’s such a small scale (an individual with another individual) it’s not going to be financially worth it to take the case to court.

so copyright holders just cut their losses with one friend sharing it with another friend, and focus on other infringements instead.

which is where the whole torrenting thing comes in. if i can track 7000 people who have all downloaded the same torrented album, now i can just send a letter / court date to those 7000 people.

the costs of enforcement are reduced because of scale. 7000 people, all found the same thing, in a way that can be tracked.

and the ultimate, one person/company has download the works and making them available to others to download, without paying for the rights to make copies when distributing.

that’s the ultimate goldmine for copyright infringement lawsuits. and it sounds suspiciously like openAi’s business model.


>borrowing a book is not creating a COPY of the book. you are not taking the pages, reproducing all of the text on those pages, and then giving that reproduction to your friend.

That's not what's happening with training AI models either though.


check out my other comment in this thread about derivative works.

https://news.ycombinator.com/item?id=42282443

OpenAI are taking copies of people’s data. some of that is copyrighted data.

that’s copyright infringement.

an LLM is a tool to create derivative works from the data OpenAI has copied without permission (when considering only copyrighted works and nothing public ___domain).

derivative works can also be considered copyright infringement in some cases.

how the tool functions is irrelevant for the most part. how copy right infringement occurs doesn’t matter. only that it does.


Copying copyrighted works?


learning, and extracting useful information from copyrighted works.

These extracted useful information cannot and should not be copyrightable.


Learning from copy written work requires a license to access that work. You can extract information from the world's best books by purchasing those books. But no author is being compensated here they download books.torrent then uses that pirated material to then profit.


If you’re arguing that OpenAI should be compelled to make all their technology and models free then I think we all agree, but it sounds like you’re trying to weasel your way into letting a corpo get away with breaking the law while running away with billions.


> If you’re arguing that OpenAI should be compelled to make all their technology and models free

no i'm not - i'm arguing that it's weights are not copyrightable. It doesn't have to be free or not - that is a separate (and uninteresting) argument.


That’s really expensive to do, so in practice only wealthy humans or corporations can do so. Still seems unfair.


Yeah it turns out humans have more rights than computer programs and tech startups.


So make OpenAI sleep 8 hours a day, pay income and payroll taxes with the same deductions as a natural human etc...


ChatGPT doesn't violate copyright, it's a software application. "Open"AI does, it's a company run by humans (for now).


> they're just straight-up breaking the law and getting away with it.

So far this has not been determined and there's plenty of reasonable arguments that they are not breaking copyright law.


> Absolutely: if copyright is slowing down innovation, we should abolish copyright.

Is this sarcasm?


No. If something slows down innovation and suffocates the economy, why would you (an economically minded politician) keep it?


Because the world shouldn't be run primarily by economically minded politicians??

I'm sure China gets competitive advantages from their use of indentured and slave-like labor forces, and mass reeducation programs in camps. Should the US allow these things to happen? What about if a private business starts?

But remember, they're just trying to compete with China on a fair playing field, so everything is permitted right?


You might want to look at the constitutional amendment enshrining slave labor "as a punishment for a crime," and the world's largest prison population. Much of your food supply has links to prison labor.

https://apnews.com/article/prison-to-plate-inmate-labor-inve...

But don't worry, it's not considered "slave labor" because there's a nominal wage of a few pennies involved and it's not technically "forced." You just might be tortured with solitary confinement if you don't do it.

We need to point fewer fingers and clean up the problems here.


I'm more concerned that someone people in the tech world are conflating Sam Altman's interest with the national interest.


Easy to turn one into the other, just get someone to leak the model weights.


That doesn't really do it right? The state would have to have its own training cluster.


Am I jazzed about Sam Altman making billions? No.

Am I even more concerned about the state having control over the future corpus of knowledge via this doomed-in-any-case vector of "intellectual property"? Yes.

I think it will be easier to overcome the influence of billionaires when we drop the pretext that the state is a more primal force than the internet.


100% disagree. "It'll be fine bro" is not a substitute for having a vote over policy decisions made by the government. What you're talking about has a name. It starts with F and was very popular in Italy in the early to mid 20th century.


Rapidity of Godwin's law notwithstanding, I'm not disputing the importance of equity in decision-making. But this matter is more complex than that: it's obvious that the internet doesn't tolerate censorship even if it is dressed as intellectual property. I prefer an open and democratic internet to one policied by childish legacy states, the presence of which serves only (and only sometimes) to drive content into open secrecy.

It seems particularly unfair to equate any questioning of the wisdom of copyright laws (even when applied in situations where we might not care for the defendant, as with this case) with fascism.


It's not Godwin's law when it's correct. Just because it's cool and on the Internet doesn't mean you get to throw out people's stake in how their lives are run.


> throw out people's stake in how their lives are run

FWIW, you're talking to a professional musician. Ostensibly, the IP complex is designed to protect me. I cannot fathom how you can regard it as the "people's stake in how their lives are run". Eliminating copyright will almost certainly give people more control over their digital lives, not less.

> It's not Godwin's law when it's correct.

Just to be clear, you are doubling down on the claim that sunsetting copyright laws is tantamount to nazism?


Not at all. Go re read above.


Get ahead in terms of what? Do you believe that the material in public ___domain or legally available content that doesn't violate copyrights is not enough to research AI/LLMs or is the concern about purely commercial interests?

China also supposedly has abusive labor practices. So, should other countries start relaxing their labor laws to avoid falling behind ?


Shall we install the emperor then?


I think the main issue with this line of reasoning is that before all of these internet based forms of content, did many people actually read literature, like is mentioned in the article? I am not sure, but I would expect that the books that were read were probably not some high form of art but entertainment - which is fine of course, and it is different to a large extent from scrolling tiktok, but does it like up with the authors thesis about reading providing some deeper form of enlightenment? Maybe it still does, since it's still the same act of reading, but maybe not as much, since it's not normal "intellectual content"


Right hand rule is just an arbitrary decision defining counterclockwise to be positive, but I guess it's true that it could be "less arbitrary" if certain things are more counterclockwise than clockwise



I found the part about the possibility of life on neutron stars to be fascinating.


Then you will love "Dragon's Egg" by Robert L. Forward. https://en.wikipedia.org/wiki/Dragon%27s_Egg


Democracy in the US isn't mutually exclusive with any of those?


not sure about reducing it to just chat but using the entire "chat gpt" as a verb is really common like "let me chatgpt this assignment" (love it or hate it is very common in schools)


Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: