Hacker News new | past | comments | ask | show | jobs | submit login
Protecting customers with generative AI indemnification (cloud.google.com)
113 points by stravant on Oct 13, 2023 | hide | past | favorite | 62 comments



How many of these have we seen now?

Adobe have offered indemnification for Firefly: https://techcrunch.com/2023/06/26/adobe-indemnity-clause-des...

"With Firefly, Adobe will also be offering enterprise customers an IP indemnity, which means that Adobe would protect customers from third party IP claims about Firefly-generated outputs."

Here's Microsoft for their Copilot (which I do not think is the same thing as GitHub Copilot): https://blogs.microsoft.com/on-the-issues/2023/09/07/copilot...

"To address this customer concern, Microsoft is announcing our new Copilot Copyright Commitment. As customers ask whether they can use Microsoft’s Copilot services and the output they generate without worrying about copyright claims, we are providing a straightforward answer: yes, you can, and if you are challenged on copyright grounds, we will assume responsibility for the potential legal risks involved."

And for GitHub Copilot: https://github.com/features/copilot/#faq

"What if I’m accused of copyright infringement based on using a GitHub Copilot suggestion?

GitHub will defend you as provided in the GitHub Copilot Product Specific Terms."

That links to a document which says this:

"If your Agreement provides for the defense of third party claims, that provision will apply to your use of GitHub Copilot, including to the Suggestions you receive. Notwithstanding any other language in your Agreement, any GitHub defense obligations related to your use of GitHub Copilot do not apply if you have not set the Duplicate Detection filtering feature available in GitHub Copilot to its “Block” setting."

I don't understand the "If your Agreement provides for the defense of third party claims" bit though.


It really feels like tech companies are taking the approach of "we'll guarantee you anything you want!" As a sales strategy.

The cynical part of me wants to say it shows that they have high confidence they can manipulate the legal system enough to dictate the outcome of any challenges.


this might work in the US but is unlikely to work elsewhere in the world

the EU in particular is likely to pay less than zero attention to the interests of large US tech companies

the liability they're taking on here could be absolutely gigantic


Not a lawyer but I imagine the infringement on the input side, during the dataset creation and training would take place wherever the model is being trained? So presumably the us? On the other hand, yeah. Memorizing input data and spitting out protected characters on the output side seems like an issue.


They clearly can't indemnify that all output is copyright free if the person inputting is describing a copyrighted work either.

Sufficiently advanced Gen AI can generate images / text that looks like copyrighted work from description while being licensed for all training data.


I look at it more like this;

"Vendor lock your data and workload on OUR platform and we'll shield you forever!"

The fear being that you'll perform experiments outside of their cleverly scoped sandbox where they get to amortize your training data into theirs for free.

Basically "sure, you can bring your toys over to play in our sandbox. You just have to leave them here so we can be sure you don't eclipse our capability or market share without us."


Mercedes-Benz also. This is an interesting trend. I wonder if there is any historical precedent for suppliers of a new technology to provide indemnification to protect against uncertainty in the legal regime.


> If your Agreement provides for the defense of third party claims

I think this is saying "you have to let our lawyers argue on your behalf. If you fight it with your own shitty lawyers and lose, we won't pay your losses".


Have we seen one from OpenAI apart from the one you mentioned for Copilot via MS?

Sorry if stupid question.


As a legal strategy, could it be these large companies with indemnification clauses want to take these cases on rather than risk smaller companies getting sued without adequate resources and therefore defining a suboptimal precedent?


And smaller companies and open models cannot offer the same level of indemnity.

Also, this is where having a patent portfolio helps these big companies.


How exactly are patents involved here? This is simple legal department muscle that they're flexing.


Legal muscle involves posturing with enforcement of parents.


Smaller companies certainly can, just at a higher relative level of risk for the overall health of the company.


Yes. And oalso they are being sued on generative AI anyway, might as well get a side of free marketing from it.


They don’t want to take any cases, they want a moat.


Yeah, I'm starting to think this is the case. They are prepared for a legal fight and likely want one sooner rather than later.


Or that Google believes they haven't trained their models using any protected data.


I doubt it. They just think it's fair use.


If that were their line of thought, they could always step in on a case-by-case basis. No company is going to say no to an offer of Google paying all their legal costs and paying any damages if the case is lost.


Makes sense given how much potential liability they're taking on.


Adobe, Microsoft and Google have all done this now.

Thats ~ 4 trillion dollars of companies betting that the law will say anyone may train an AI model on any public data, and anyone may use the output of that AI without compensating owners of the training data.

When 4 trillion dollars is at stake, not only do you put the best lawyers on the case, but you also pay congress to change the law if things aren't heading your way.

I'm pretty sure now that the debate of AI ownership is a foregone conclusion - nobody owns AI outputs.


>nobody owns AI outputs

This would be fantastic imo. A new era of the commons.

>Adobe

I disagree here. Adobe has trained only on public ___domain and their own stock images. So why would adobe be against training on unlicensed data being an infringement? It would eliminate much of their competition...


> Adobe has trained only on public ___domain and their own stock images.

Adobe is lying. They are relying on general ignorance about the technology to get away with it.

Adobe has not shown how they train the text encoders in Firefly, or what images were used for the text-based conditioning (i.e. "text to image") part of their image generation model. They are almost certainly using CLIP or T5, which are trained on LAION2b, an image dataset with the very problems they are trying to address, C4 (a text dataset similarly encumbered) and similar.

bUt nO oNe eLsE hAs bRoUgHt tHiS uP. It's so arcane for non-practitioners. Talk about this directly with someone like Astropulse, who monetizes a Stable Diffusion model: no confusion, totally agrees with me. By comparison, I've pinged the Ars Technica journalist who just wrote about this issue: crickets. Posted to the Adobe forum: crickets. E-mailed them on their specific address for this: crickets. I have no idea why something so obvious has slipped by everyone's radar!


Would it be impossible to train their own text encoder on just the images they have? How many would one need?


I welcome anyone who works at Adobe to simply answer this question and put it to rest. There is absolutely nothing sensitive about the issue, unless it exposes them in a lie.

So no chance. I think it's a big fat lie. They'd have to have made some other scientific breakthrough, which they didn't.

Using information from https://openai.com/research/clip and https://github.com/mlfoundations/open_clip, it's possible to answer this question.

It's certainly not impossible, but it's impracticable. On 248m images (roughly the size of Adobe Stock), CLIP gets 37% on ImageNet, and on the 2000m from LAION, it performs 71-80%. And even with 2000m images, CLIP is substantially worse performing than the approach that Imagen uses for "text comprehension," which relies on essentially many billions more images and text tokens.


Interesting. I looked through the laion Datasets a bit and it was astonishing how bad the captions really are. Very very short captions if not completely wrong. Amazing to me that this even works at all. I wonder how much better clip etc would perform and be more efficient if they had probably tagged images, not just with the alt text. Maybe that's why dalle 3 is so good at following the prompts?


> I'm pretty sure now that the debate of AI ownership is a foregone conclusion - nobody owns AI outputs.

but at the same time, they put ToS that you may not train a new LLM using the output of their LLM...


Classic case of wanting their cake and eating it too. Although I don't think they'll be surprised if their TOS doesn't hold up in court either.


IANAL, but I believe that wouldn't just be simple copyright infringement, but a breach of contract.


The workaround is to use an intermediary so that you don’t have any contractual obligations to breach, and the intermediary stays far away from your downstream use, so they never breached the agreement either.


I would refine that:

Right now, I think it's more that they don't want US v TeensyStartup to be the case that sets precedent.

By stepping in with these indemnification clauses, they aren't betting $4T that they're sure to win. They're just reserving a (much smaller) open check to protect against losing because of somebody else's lawyers.

They may win, they may lose, but they want to make sure they're the ones who get to fight for it either way.


They could step in on a case by case basis even without this indemnification.


Who will be the first to type in "Mickey Mouse, digital art" and watch these companies take on $DIS?


"this indemnity only applies if you didn’t try to intentionally create or use generated output to infringe the rights of others"


If you somehow accidentally generate Mickey Mouse and decide to monetize it, how are you going to defend yourself that your prompt didn't include "Mickey Mouse" or something?


If it gets to in front of an actual judge, they would likely conclude that the average person should have been able to recognize the characteristic image of Mickey Mouse. So they won't even bother with asking whether the prompt actually contained those words or not.


Yeah, I guess that works for characters like Mickey Mouse. But for every Mickey Mouse there are hundreds or thousands of other characters that are less widely known.


Those very likely won't get in front of an actual judge, since the damages would be small enough to make a long court battle pointless, so the question would be moot.


>An important note here: you as a customer also have a part to play. For example, this indemnity only applies if you didn’t try to intentionally create or use generated output to infringe the rights of others, and similarly, are using existing and emerging tools, for example to cite sources to help use generated output responsibly.

The second part here (after "similarly") seems like a big asterisk, no? So Google can just duck out if they don't think you added enough citations? or you didn't ask the AI where every piece of the output is coming from?


Pretty important move to assure commercial adoption. I guess money can be a moat if technology can't, with all the open source and small startup alternatives coming out with their own image generators


Pretty much. "Lawyers as a Service" for any AI related copyright claims.

I also find it interesting that generative AI for images seems missing? I wonder if this is intentionally selective. Also possible I'm misunderstanding where Imagen etc. lives in the listed products


I take it that this is undergirded by the recent Google "privacy policy" announcement which indicated it claims a right to "scrape everything you post online for AI":

https://gizmodo.com/google-says-itll-scrape-everything-you-p...


"According to Section 102(b) of the Copyright Act of 1976, no “idea, procedure, process, system, method of operation, concept, principle, or discovery” is eligible for copyright protection."

"Copyright law generally protects the fixation of an idea in a “tangible medium of expression,” not the idea itself, or any processes or principles associated with it." -- https://strebecklaw.com/idea-expression/

By tokenizing the data an AI bypasses the tangible particular expression that can be copyrighted under the Copyright Act, and takes away just the concepts. On generation, those concepts are converted back into tangible human expression that's unlikely to be protected by a copyright.

The indemnification means that Google engineers have convinced Google lawyers that this is in fact the case.


> By tokenizing the data an AI bypasses the tangible particular expression that can be copyrighted, and takes away just the concepts.

We shall see whether courts agree with this.


A welcome development for consumers of genAI, but unless I am missing something this is bad for artists. Art is not a menial job to be disrupted and eliminated.

I am not a lawyer but it does not seem just to me that the creators of the training data should receive no compensation. What happened to "data is the new oil" ??


Data is the new oil. The dinosaurs didn’t get paid for the old oil, either.


AI is climate change of consuming all that data


Google could have made a $500 Million fund to do this indemnification...

But instead they have betted their whole company on it - ie. ~$1.5 Trillion

That means they're really sure.


I mean you can "lobby" a congressperson for like $10k. IMO they already know what the outcome will be.


L'État, c'est moi


"At Google Cloud, we put your interests first."


"we have a ton of money and our lawyers are the best"

Also good way to build up the Art230 for AI via precedents


Oh Google. Hope "shared fate" is not a subtle reference to "No Fate" from Terminator. Sarah Connor carves it into a table with a knife after her dream of an AI instigated nuclear war.


No. It definitely isn't


> At Google Cloud, we put your interests first.

Companies are way too comfortable boldly lying to their customers. If they really put my interests first, they'd give me their services for free.


In your hypothetical where a company gives your their services for free, the company runs out of money, goes out of business, and all of a sudden the services that you were relying on are no longer available. That doesn't seem in anyone's interest. Giving away services for free doesn't seem in the interest of users.

In fact we can see first hand how Google search and other Google products have gotten worse because they give them away for free and as a result have to make money by selling their customers eyeballs to advertisers.


Indeed, they'd fail if they really put their customer's interests first. That's what makes their statement an obvious lie.


Is that true?

If they gave you their services for free, there'd be a limited amount of time until they could no longer give away those services at all (e.g. they run out of money). You can still technically put your own financial gains "first" if they're a necessity to being in a position to put someone else first longer-term.

It's the same reason you give what you can to charity, not just give 100% of your cash every time you have any.


I think this is the single most shallow, bad faith criticism I’ve ever seen.


It's impossible to have any deep critique of a shallow subject.

The bad faith is on Google's part-- they use vague slogans that signal generosity and kindness, while their actions are exploitative and borrow from the ethics and TTPs of malware authors. Last time, it was "don't be evil." Now, they're putting users first. How kind of them!

Anyone brave enough to adopt "don't be evil" as a motto deserves scrutiny when they find need to change it. It's a warrant canary whose absence speaks for itself.

"Ok, prove it" is a challenge not enough of today's conspicuous bullshitters are confronted with.


Oh please, you really think Google puts my interests first? People have gotten so used to companies spouting empty bullshit that it's become normal and not even looked at critically.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: