So, they just announced StableCascade. Wouldn't this v3 supersede the StableCasc...

Kubuxu · on Feb 22, 2024

I think of the SD3 as a further evolution of SD1.5/2/XL and StableCascade as a branching path. It is unclear which will be better in the long term, so why not cover both directions if they have the resources to do so?

ttul · on Feb 22, 2024

I suspect Stable Cascade may incorporate a DiT at some point. The UNet is easily swapped out. SC’s main innovation is the training of a semantic compressor model and a VQGAN that translates the latent output from the diffusion model back to image space - rather than relying on a VAE.

It’s a really smart architecture and I think is fertile ground for stacking on new things like DiT.

whywhywhywhy · on Feb 22, 2024

There's architectural differences, although I found Stable Cascade a bit underwhelming, while yes it can actual manage text, the text it does manage just looks like someone just wrote text over the image it doesn't feel integrated a lot of the time.

SD3 seems to be more towards SOTA, not sure why Cascade took so long to get out, seemed to be up and running months ago

Dwedit · on Feb 22, 2024

Stable Cascade has a distinct noisy look to generated images. It almost looks as bad as images being dithered to the old 216 color Netscape palette.

ttul · on Feb 22, 2024

If you renoise the output of the first diffusion stage to halfway and then denoise forward again, you can eliminate the bad output. This approach is called “replay” or “iterative mixing” and there are a few open source nodes for ComfyUI you can refer to.