Hacker News new | past | comments | ask | show | jobs | submit login

What, if any, practical implications does this have? Why would a real person or company want to specify a non real person as an author?



The practical implication is you can't copyright something that your AI generated. As the article notes, copyright applications are also being rejected in cases where a human asserts authorship over an AI generated work.


> The practical implication is you can't copyright something that your AI generated.

No, its not.

This is not a case of the human trying to claim copyright as the author of a work made using AI tools.

> As the article notes, copyright applications are also being rejected in cases where a human asserts authorship over an AI generated work.

That is true (although at least one has been accepted by the copyright office, IIRC), but it is not an outcome of this case (even in the sense that this ruling might support it) because this case does not concern human claims of authorship at all. It concerns undisputed solely-AI creation.


> you can't copyright something that your AI generated

Seems like a loophole, if I generate synthetic data with a model trained on copyrighted works, the synthetic data is copyright free? So I can later train models on it?


You can't "launder" copyright away like that. The court will see straight through it. See "What color are your bits?" at https://ansuz.sooke.bc.ca/entry/23


There are over 200K language modeling datasets on Hugging Face, I bet a large portion of them were generated with LLMs, and all LLMs to date have been trained on copyrighted data. So they are all tainted.

But philosophically, I wonder if it's allright to block that, it techincally follows the definition of copyright. It does not carry the expression, but borrows abstractions and facts. That's exactly what is allowed.

If we move to block synthetic data, then anyone can be accused of infringement when they reuse abstractions learned somewhere else. Creativity would not be possible.

On the other hand models trained on synthetic data will never regurgitate the originals because they never saw them.


That's a legal implication. I'm asking what is it a practical implication. Why would an AI want to copyright their work?


So that you can run an AI company, churn out enough material to flood a particular market, and leverage copyright protection to cash in. Like say you call it the Kittenator, and then do automated keyword search for anything involving kittens - kitten in a box, kitten wearing socks, kittens on the rocks, kitten versus fox - and generate 25 different images for any given keyword combination, and push them out to major image-sharing platforms. The stock imagery market is pretty large but if you have the copyright enforcement in your pocket you can go after it in chunks.


You don't need an AI assigned copyright to do that. Companies have humans at them too.


Well you do if someone rejects a copyright claim on the grounds that the image is AI-generated, and a court backs them up.


The court did not say AI generated images are not eligible for copyright. They said machines cannot be assigned copyrights. That’s because only humans are eligible.

If you are a human who creatively uses a tool to generate something, you’d get copyright protection.


Pretty sure Adobe is doing exactly this.


https://itsartlaw.org/2023/12/11/case-summary-and-review-tha... attempted to assign copyright to AI. I think it was mostly for the purpose of getting to officially work through the legal arguments around the issue.


Unlicensed Human Code is 100% copyrighted and closed source.

Unlicensed AI Code is 0% copyrighted and open source and can't be closed.


Not open source...... public ___domain. There is a big difference.


When I have a LLM that spits out code identical to copyrighted code can I then use it legally?

Otherwise I would need to check the output of every LLM for copyright infringement


Not if it was trained on that copyrighted code; the copyright "survives" the training process, legally-speaking, just as it does if you hear a song, and then output (even truly accidentally) the exact same song and claim it as your own.

If you can perfectly prove that no copyrighted code was used in training a model and that the model was not algorithmically designed to output that code, based on knowledge of the copyrighted code on the creator's part, but it outputs code identical to a copyrighted program, it could very likely not be infringement... but obviously that's a high bar to clear for a complex program.

If your model always outputs

> #!/bin/bash > echo "hello world"

another programmer will likely not be able to claim copyright infringement on it. If it always outputs Adobe Photoshop, you're gonna need a very good lawyer, and a Truman-show-esque mountain of evidence on your side.


The AI will do that - for a price.


Code that the LLM reproduced without modification from it's ripped off "training set." I literally have no idea what kind of deranged person does not notice this let alone believes that they should profit from it.


> What, if any, practical implications does this have?

Very little.

> Why would a real person or company want to specify a non real person as an author?

Other than to needlessly complicate the claim that the work is subject to copyright? No reason at all.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: