"With Firefly, Adobe will also be offering enterprise customers an IP indemnity, which means that Adobe would protect customers from third party IP claims about Firefly-generated outputs."
"To address this customer concern, Microsoft is announcing our new Copilot Copyright Commitment. As customers ask whether they can use Microsoft’s Copilot services and the output they generate without worrying about copyright claims, we are providing a straightforward answer: yes, you can, and if you are challenged on copyright grounds, we will assume responsibility for the potential legal risks involved."
"What if I’m accused of copyright infringement based on using a GitHub Copilot suggestion?
GitHub will defend you as provided in the GitHub Copilot Product Specific Terms."
That links to a document which says this:
"If your Agreement provides for the defense of third party claims, that provision will apply to your use of GitHub Copilot, including to the Suggestions you receive. Notwithstanding any other language in your Agreement, any GitHub defense obligations related to your use of GitHub Copilot do not apply if you have not set the Duplicate Detection filtering feature available in GitHub Copilot to its “Block” setting."
I don't understand the "If your Agreement provides for the defense of third party claims" bit though.
It really feels like tech companies are taking the approach of "we'll guarantee you anything you want!" As a sales strategy.
The cynical part of me wants to say it shows that they have high confidence they can manipulate the legal system enough to dictate the outcome of any challenges.
Not a lawyer but I imagine the infringement on the input side, during the dataset creation and training would take place wherever the model is being trained? So presumably the us? On the other hand, yeah. Memorizing input data and spitting out protected characters on the output side seems like an issue.
"Vendor lock your data and workload on OUR platform and we'll shield you forever!"
The fear being that you'll perform experiments outside of their cleverly scoped sandbox where they get to amortize your training data into theirs for free.
Basically "sure, you can bring your toys over to play in our sandbox. You just have to leave them here so we can be sure you don't eclipse our capability or market share without us."
Mercedes-Benz also. This is an interesting trend. I wonder if there is any historical precedent for suppliers of a new technology to provide indemnification to protect against uncertainty in the legal regime.
> If your Agreement provides for the defense of third party claims
I think this is saying "you have to let our lawyers argue on your behalf. If you fight it with your own shitty lawyers and lose, we won't pay your losses".
As a legal strategy, could it be these large companies with indemnification clauses want to take these cases on rather than risk smaller companies getting sued without adequate resources and therefore defining a suboptimal precedent?
If that were their line of thought, they could always step in on a case-by-case basis. No company is going to say no to an offer of Google paying all their legal costs and paying any damages if the case is lost.
Adobe, Microsoft and Google have all done this now.
Thats ~ 4 trillion dollars of companies betting that the law will say anyone may train an AI model on any public data, and anyone may use the output of that AI without compensating owners of the training data.
When 4 trillion dollars is at stake, not only do you put the best lawyers on the case, but you also pay congress to change the law if things aren't heading your way.
I'm pretty sure now that the debate of AI ownership is a foregone conclusion - nobody owns AI outputs.
This would be fantastic imo. A new era of the commons.
>Adobe
I disagree here. Adobe has trained only on public ___domain and their own stock images. So why would adobe be against training on unlicensed data being an infringement? It would eliminate much of their competition...
> Adobe has trained only on public ___domain and their own stock images.
Adobe is lying. They are relying on general ignorance about the technology to get away with it.
Adobe has not shown how they train the text encoders in Firefly, or what images were used for the text-based conditioning (i.e. "text to image") part of their image generation model. They are almost certainly using CLIP or T5, which are trained on LAION2b, an image dataset with the very problems they are trying to address, C4 (a text dataset similarly encumbered) and similar.
bUt nO oNe eLsE hAs bRoUgHt tHiS uP. It's so arcane for non-practitioners. Talk about this directly with someone like Astropulse, who monetizes a Stable Diffusion model: no confusion, totally agrees with me. By comparison, I've pinged the Ars Technica journalist who just wrote about this issue: crickets. Posted to the Adobe forum: crickets. E-mailed them on their specific address for this: crickets. I have no idea why something so obvious has slipped by everyone's radar!
I welcome anyone who works at Adobe to simply answer this question and put it to rest. There is absolutely nothing sensitive about the issue, unless it exposes them in a lie.
So no chance. I think it's a big fat lie. They'd have to have made some other scientific breakthrough, which they didn't.
It's certainly not impossible, but it's impracticable. On 248m images (roughly the size of Adobe Stock), CLIP gets 37% on ImageNet, and on the 2000m from LAION, it performs 71-80%. And even with 2000m images, CLIP is substantially worse performing than the approach that Imagen uses for "text comprehension," which relies on essentially many billions more images and text tokens.
Interesting. I looked through the laion Datasets a bit and it was astonishing how bad the captions really are. Very very short captions if not completely wrong. Amazing to me that this even works at all. I wonder how much better clip etc would perform and be more efficient if they had probably tagged images, not just with the alt text. Maybe that's why dalle 3 is so good at following the prompts?
The workaround is to use an intermediary so that you don’t have any contractual obligations to breach, and the intermediary stays far away from your downstream use, so they never breached the agreement either.
Right now, I think it's more that they don't want US v TeensyStartup to be the case that sets precedent.
By stepping in with these indemnification clauses, they aren't betting $4T that they're sure to win. They're just reserving a (much smaller) open check to protect against losing because of somebody else's lawyers.
They may win, they may lose, but they want to make sure they're the ones who get to fight for it either way.
If you somehow accidentally generate Mickey Mouse and decide to monetize it, how are you going to defend yourself that your prompt didn't include "Mickey Mouse" or something?
If it gets to in front of an actual judge, they would likely conclude that the average person should have been able to recognize the characteristic image of Mickey Mouse. So they won't even bother with asking whether the prompt actually contained those words or not.
Yeah, I guess that works for characters like Mickey Mouse. But for every Mickey Mouse there are hundreds or thousands of other characters that are less widely known.
Those very likely won't get in front of an actual judge, since the damages would be small enough to make a long court battle pointless, so the question would be moot.
>An important note here: you as a customer also have a part to play. For example, this indemnity only applies if you didn’t try to intentionally create or use generated output to infringe the rights of others, and similarly, are using existing and emerging tools, for example to cite sources to help use generated output responsibly.
The second part here (after "similarly") seems like a big asterisk, no? So Google can just duck out if they don't think you added enough citations? or you didn't ask the AI where every piece of the output is coming from?
Pretty important move to assure commercial adoption. I guess money can be a moat if technology can't, with all the open source and small startup alternatives coming out with their own image generators
Pretty much. "Lawyers as a Service" for any AI related copyright claims.
I also find it interesting that generative AI for images seems missing? I wonder if this is intentionally selective. Also possible I'm misunderstanding where Imagen etc. lives in the listed products
I take it that this is undergirded by the recent Google "privacy policy" announcement which indicated it claims a right to "scrape everything you post online for AI":
"According to Section 102(b) of the Copyright Act of 1976, no “idea, procedure, process, system, method of operation, concept, principle, or discovery” is eligible for copyright protection."
"Copyright law generally protects the fixation of an idea in a “tangible medium of expression,” not the idea itself, or any processes or principles associated with it." -- https://strebecklaw.com/idea-expression/
By tokenizing the data an AI bypasses the tangible particular expression that can be copyrighted under the Copyright Act, and takes away just the concepts. On generation, those concepts are converted back into tangible human expression that's unlikely to be protected by a copyright.
The indemnification means that Google engineers have convinced Google lawyers that this is in fact the case.
A welcome development for consumers of genAI, but unless I am missing something this is bad for artists. Art is not a menial job to be disrupted and eliminated.
I am not a lawyer but it does not seem just to me that the creators of the training data should receive no compensation. What happened to "data is the new oil" ??
Oh Google. Hope "shared fate" is not a subtle reference to "No Fate" from Terminator. Sarah Connor carves it into a table with a knife after her dream of an AI instigated nuclear war.
In your hypothetical where a company gives your their services for free, the company runs out of money, goes out of business, and all of a sudden the services that you were relying on are no longer available. That doesn't seem in anyone's interest. Giving away services for free doesn't seem in the interest of users.
In fact we can see first hand how Google search and other Google products have gotten worse because they give them away for free and as a result have to make money by selling their customers eyeballs to advertisers.
If they gave you their services for free, there'd be a limited amount of time until they could no longer give away those services at all (e.g. they run out of money). You can still technically put your own financial gains "first" if they're a necessity to being in a position to put someone else first longer-term.
It's the same reason you give what you can to charity, not just give 100% of your cash every time you have any.
It's impossible to have any deep critique of a shallow subject.
The bad faith is on Google's part-- they use vague slogans that signal generosity and kindness, while their actions are exploitative and borrow from the ethics and TTPs of malware authors. Last time, it was "don't be evil." Now, they're putting users first. How kind of them!
Anyone brave enough to adopt "don't be evil" as a motto deserves scrutiny when they find need to change it. It's a warrant canary whose absence speaks for itself.
"Ok, prove it" is a challenge not enough of today's conspicuous bullshitters are confronted with.
Oh please, you really think Google puts my interests first? People have gotten so used to companies spouting empty bullshit that it's become normal and not even looked at critically.
Adobe have offered indemnification for Firefly: https://techcrunch.com/2023/06/26/adobe-indemnity-clause-des...
"With Firefly, Adobe will also be offering enterprise customers an IP indemnity, which means that Adobe would protect customers from third party IP claims about Firefly-generated outputs."
Here's Microsoft for their Copilot (which I do not think is the same thing as GitHub Copilot): https://blogs.microsoft.com/on-the-issues/2023/09/07/copilot...
"To address this customer concern, Microsoft is announcing our new Copilot Copyright Commitment. As customers ask whether they can use Microsoft’s Copilot services and the output they generate without worrying about copyright claims, we are providing a straightforward answer: yes, you can, and if you are challenged on copyright grounds, we will assume responsibility for the potential legal risks involved."
And for GitHub Copilot: https://github.com/features/copilot/#faq
"What if I’m accused of copyright infringement based on using a GitHub Copilot suggestion?
GitHub will defend you as provided in the GitHub Copilot Product Specific Terms."
That links to a document which says this:
"If your Agreement provides for the defense of third party claims, that provision will apply to your use of GitHub Copilot, including to the Suggestions you receive. Notwithstanding any other language in your Agreement, any GitHub defense obligations related to your use of GitHub Copilot do not apply if you have not set the Duplicate Detection filtering feature available in GitHub Copilot to its “Block” setting."
I don't understand the "If your Agreement provides for the defense of third party claims" bit though.