Just because the image is publicly available doesn't mean it should have been co...

wccrawford · on Aug 29, 2022

But the image wasn't "copied". An AI was trained from it, and the common analogy is to how humans learn from what they see.

The question at hand is: Is it ethical for an AI to produce images from images of questionable source. All the current AIs are trained from tons of images from the web, and there's no way to guarantee they weren't polluted with images that they didn't have permission for.

We've decided that it is ethical for humans to do that. But that's at least partly because it's impossible to create art otherwise. You're always exposed to other people's art. An AI doesn't have that problem, and additionally has a much, much better memory... And the actual creation process is different.

I think it's a really interesting ethical dilemma and haven't decided what side I'm on yet.

But like the author, a "vegan" image AI would be very welcome. I think it wouldn't be nearly as useful because a lot of modern concepts would be missing. But it'd still be welcome.

visarga · on Aug 29, 2022

> But like the author, a "vegan" image AI would be very welcome.

Say artist A is afraid of AI and doesn't want his paintings be used in training. The model will never generate anything like A unless A is copying someone else.

In time the amount of AI generated images is going to grow, it will flood the web and social networks, maybe evolve in new forms, and A will have no impact, no influence, won't be quoted. Deleted from the AI mind means deleted from the social awareness.

SideQuark · on Aug 29, 2022

>Just because the image is publicly available doesn't mean it should have been copied.

It wasn't copied - it was viewed, which is perfectly legal.

The AI does not have a copy of any of the the images it learned from. It merely learned styles and concepts of many images with associated text for those styles and concepts.

visarga · on Aug 29, 2022

Technically from each image it gets a full set of gradients that are added up on top of the previous ones.

I'm wondering if you could erase an artist from a trained model by adding gradients in the opposite direction, erased from the sum of gradients as if it wasn't part of training.

SideQuark · on Aug 29, 2022

Technically, it is no where near that. I'm not even sure what a "full set of gradients" for an image would even mean - it's not a term.

Weights are usually randomly initialized, defining a function. Many images (which are randomly batched on each epoch) are passed through, and a loss function tallied. Backprop gets a gradient, and a training schedule tweaks the weight a little bit. No image has its "gradients" somehow added. The overall function has gradients, but they are not smooth, due to many things such as ReLU being non-smooth. These are not invertible in any sense.

Th networks have many, many irreversible steps in them that truncate, that max pool, that perform dropout during steps, many of these choices done randomly. There are gradient normalizations to clip or stretch to deal with vanishing or exploding gradients, and SOOO many lossy things done that are completely irreversible.

So none of this can possibly work. A network is not simply a sum of images in some format.

visarga · on Aug 29, 2022

For each training example gradients are computed, then averaged per batch, then subtracted from model weights in proportion to the learning rate. So each example generates a "full set of gradients" (one incremental change) for all the weights of the model.

The model doesn't save the image itself, but eats the gradients. The question is how we see this process, does it entail a copyright penalty?

SideQuark · on Aug 29, 2022

>For each training example gradients are computed, then averaged per batch

No, that would be incredibly slow. The loss is summed over items in a batch, then the gradient is computed once based on the loss, then backpropagation is done. It's done this way by default both in TensorFlow and in Pytorch.

It is possible to sum per entry (see here [1] in pytorch docs) but is is extremely rare. The majority of training loops (this one included) is the standard "do a batch" then call "loss.backward()" which computes only once per batch.

Look at the "backward()" call in pytorch docs [2]: This "Computes the gradient of current tensor w.r.t. graph leaves." If this were done per item and being summed, there would be no need for this in the training loop, and nearly every training loop I've seen has this.

Here's pytorch explaining all this in detail [3].

And in any case, since the functions in the pipeline contain many non-smooth terms, the information required to restore an image, even if only one were done, is not in the gradient, any more than the slope of a line tells you the x-intercept.

Another way to think of it: even for perfectly smooth functions, derivatives lose information that cannot be recovered.

[1] https://pytorch.org/functorch/stable/notebooks/per_sample_gr...

[2] https://pytorch.org/docs/stable/generated/torch.Tensor.backw...

[3] https://pytorch.org/tutorials/beginner/blitz/autograd_tutori...

visarga · on Aug 30, 2022

Back-propagating a batch you go from one loss to the sum of individual losses and then down to each example separately. Each example has different activations that multiply with the upstream gradients, so first we need to compute per example weight gradients. The computation is a perfect mirror of the forward pass.

simonw · on Aug 29, 2022

These are all very good questions, and I don't think any of them have obvious answers yet.

spywaregorilla · on Aug 29, 2022

> Just because the image is publicly available doesn't mean it should have been copied.

There is no "Should" that makes sense here either way.

> Are the AI generated images copyrighted?

No, this is already established in the US.

> Are they protected behind any form of paywall?

You can if you want.

> Are those using the images generated by the AI responsible for derived works?

Derived works of the images? Like I generate a picture of a dog, and you draw a dick on it? Obviously not.

SideQuark · on Aug 29, 2022

>> Are the AI generated images copyrighted?

> No, this is already established in the US.

The case you're likely thinking of [1] is not at all this case. Thaler tried to have the AI assigned the copyright, then transfer it to himself. USPTO denied that the AI could have the copyright, so it could not be transferred.

This is completely the opposite of what this case is. He wanted AI to be the sole copyright holder, which the court ruled cannot be. Any creative human input to make the AI create the image (text prompts, post selection from among any images to find one that is good) is fully copyrightable by the person.

The text that an artist uses and then selects final images from are both considered creative enough to make the final copyrightable, just as if they entered numbers into Photoshop tools to make an image the way they wanted it. Plenty of these are creative enough and take significant tweaking by the authors via the text they fiddle with to be copyrightable, the same as the text they created to generate the work.

[1] https://www.copyright.gov/rulings-filings/review-board/docs/...

spywaregorilla · on Aug 29, 2022

You've answered a different question. The work is copyrightable. It is not copyrighted by default anymore than any other trivial non de minimis effort

SideQuark · on Aug 29, 2022

To

>> Are the AI generated images copyrighted?

you wrote

> No, this is already established in the US.

What cases are you then using to conclude this?

>It is not copyrighted by default anymore than any other trivial non de minimis effort

I doubt de minimus applies, since the images coming out of this AI are no where near any originals that I've seen - they may similar style, but they're so completely different that they are almost all certainly copyrighted by the author.

For example, the simplest, littlest effort to snap a picture gives it full copyright. This AI takes more work to use than snapping a picture.

And de minimis would be on the infringer to prove - the owner still has a copyright until proven otherwise. So the images are most likely copyrighted.

Anything copyrightable in the US is automatically copyrighted by default upon creation. It's why any photo taken is immediately copyrighted by the photographer. Any painting is automatically copyrighted by the painter. Any computer art is automatically copyrighted by the computer operator.

I'd expect most of these works are copyrighted just as much as if someone painted them, or used blender to make them, or used a collage tool to merge self owned photos.

spywaregorilla · on Aug 29, 2022

If you take a picture with a camera, the picture is copyrightable. And yes, it applies automatically.

But not all pictures that a camera takes are copyrighted. The same applies to AI generated content.

To me asking if AI generated images are copyrighted implies the latter issue. If you're trying to answer a different question it's just an uninteresting semantic debate.

SideQuark · on Aug 29, 2022

This AI, in this thread, is human operated. Those images are likely copyrighted.

What case are you using to claim AI images are not copyrighted? The one I cited? Or another? The one I cited, the only one I can find, is not relevant here.

Why are you avoiding citing a case to back up your claim?

spywaregorilla · on Aug 29, 2022

> This AI, in this thread, is human operated. Those images are likely copyrighted.

No, again. The AI can be human operated, and thus it can produce copyrightable works. But it can also be operated by other things, so it is not by default copyrighted. Your own case as well as the monkey selfie are just fine for this.

> Why are you avoiding citing a case to back up your claim?

Our debate here is not defending premises with evidence, it's you not understanding what the premise is.

SideQuark · on Aug 29, 2022

>No, again. The AI can be human operated, and thus it can produce copyrightable works.

And which work in the entire thread up to you moving those goalposts was not human operated?

Blog post: "Stable Diffusion is a new “text-to-image diffusion model”". "you can try..." "type in a text prompt..." "and added a prompt..." and on and on.

In every example in the post, the images were created by human inputs.

Top post in this chain: "Are the AI generated images copyrighted?" Notice the "the" instead of "Are AI generated images copyrighted?" Did you miss that word? What "the AI" do you think this refers to? Some other AI that you made up to split hairs over, or "the" AI under discussion that is creating the art in this thread?

Did you notice no one except you in the replies to that comment misunderstood it? Not one person misread it, except you.

>it's you not understanding what the premise is.

Not a single item in the chain above you was about anything other than humans using AI to make images. You made up a premise to split hairs over, but it looks like that is your modus operandi judging from past posts.

I think you need to read the thread carefully before claiming others don't understand.

Have a good day.

spywaregorilla · on Aug 29, 2022

Get over yourself buddy. I was very clear about my position and even acknowledged that it was plausibly a semantic debate:

>To me asking if AI generated images are copyrighted implies the latter issue. If you're trying to answer a different question it's just an uninteresting semantic debate.

I think it is important to note that works by the ai are not inherently copyrighted, and they certainly aren't copyrighted to the ai. It is different to by copyrightable.

It's ok if you think that's not a meaningful distinction. But I explained it multiple times and you didn't get it.

> it looks like that is your modus operandi judging from past posts.

Looks like yours is get tilted over nothing