IMO the "safety" in Stable Diffusion is becoming more overzealous where most of my images are coming back blurred, where I no longer want to waste my time writing a prompt only for it to return mostly blurred images. Prompts that worked in previous versions like portraits are coming back mostly blurred in SDXL.
If this next version is just as bad, I'm going to stop using Stability APIs. Are there any other text-to-image services that offer similar value and quality to Stable Diffusion without the overzealous blurring?
Edit:
Example prompt's like "Matte portrait of Yennefer" return 8/9 blurred images [1]
The nice thing about Stable Diffusion is that you can very easily set it up on a machine you control without any 'safety' and with a user-finetuned checkpoint.
That isn't the topic. Porn is an example, but safety is synonymous with puritanical requirements arbitrarily summed up as the lowest common denominator. I want a powerful AI, not a replacement of a priest.
Gemini demonstrated a product I do not want to use and I am aware about the requirements of corporate contexts, although I think the safety mechanisms should be in the hand of users.
Google optimized for advertisers, but I am not interested in such content as it provides little value.
Ok, but it seems very stupid to say you want the powerful AI to specifically come from a specific API when the very same tech is open sourced for any one to do whatever they want with
No large scale model maker is going to put out public models for B2B with dubious use cases.
what the problem is: OpenAI, facebook, Google are not curating the data sets. you're arguing they shouldn't put controls after the fact. but what you actually want is them to use quality datasets.
Taking the actual example you provided, I can understand the issue. Since it amounts to blurring images of a virtual character, that are not actually "naughty." Equivalent images in bulk quantity are available on every search engine with "yennefer witcher 3 game" [1][2][3][4][5][6] Returns almost the exact generated images, just blurry.
I've never seen blurring in my images. Is that something that they add when you do API access? I'm running SD 1.5 and SDXL 1.0 models locally. Maybe I'm just not prompting for things they deem naughty. Can you share an example prompt where the result gets blurred?
If you run locally with the basic stack it’s literally a bool flag to hide nsfw content. It’s trivial to turn off and off by default in most open source setups.
Wait, blurring (black) means that it objected to the content? I tried it a few times on one of the online/free sites (Huggingspace, I think) and I just assumed I'd gotten a parameter wrong.
Given the optimizations applied to SDXL (comparing to SD 1.5), it is understandable why it outputs blurry backgrounds. It is not for safety, it is just a cheap way to hide imperfections of technology. Imagine 2 neural networks: one occasionally outputs Lovecraftian hallucinated chimeras on backgrounds, another one outputs sterile studio-quality images. Researches selected the second approach.
It appears that they are trying to prevent generating accurate images of a real person, because they are worried about deepfakes, and this produces the blurring. While Yennefer is a fictional character she's played by a real actress on Netflix, so maybe that's what is triggering the filter.
I haven't tried SD3, but my local SD2 regularly has this pattern where while the image is developing it looks like it's coming along fine and then suddenly in the last few rounds it introduces weird artifacts to mask faces. Running locally doesn't get around censorship that's baked into the model.
I tend to lean towards SD1.5 for this reason—I'd rather put in the effort to get a good result out of the lesser model than fight with a black box censorship algorithm.
EDIT: See the replies below. I might just have been holding it wrong.
Be sure to turn off the refiner. This sounds like you’re making models that aren’t aligned with their base models and the refiner runs in the last steps. If it’s a prompt out of alignment with the default base model it’ll heavily distort. Personally with SDXL I never use the refiner I just use more steps.
Well ya because SD2 literally had purposeful censorship of the base model and the clip, that basically made it DOA to the entire opensource community that were dedicated to 1.5, SDXL wasnt so bad so it gained traction but still 1.5 is the king because it was from before the damn models were gimped at the knees and relied on workarounds and insane finetunes just to get basic anatomy correct.
Probably not, since I have no idea what you're talking about. I've just been using the models that InvokeAI (2.3, I only just now saw there's a 3.0) downloads for me [0]. The SD1.5 one is as good as ever, but the SD2 model introduces artifacts on (many, but not all) faces and copyrighted characters.
EDIT: based on the other reply, I think I understand what you're suggesting, and I'll definitely take a look next time I run it.
SDXL should be used together with a refiner. You can usually see the refiner kicking in if you have a UI that shows you the preview of intermediate steps. And it can sometimes look like the situation you describe (straining further away from your desired result).
That person would rather pay for API than set up locally (which is simple as unzip and add model), setting up in cloud can be painful if you aren't familiar
If this next version is just as bad, I'm going to stop using Stability APIs. Are there any other text-to-image services that offer similar value and quality to Stable Diffusion without the overzealous blurring?
Edit:
Example prompt's like "Matte portrait of Yennefer" return 8/9 blurred images [1]
[1] https://imgur.com/a/nIx8GBR