I think of the SD3 as a further evolution of SD1.5/2/XL and StableCascade as a b...

ttul · on Feb 22, 2024

I suspect Stable Cascade may incorporate a DiT at some point. The UNet is easily swapped out. SC’s main innovation is the training of a semantic compressor model and a VQGAN that translates the latent output from the diffusion model back to image space - rather than relying on a VAE.

It’s a really smart architecture and I think is fertile ground for stacking on new things like DiT.