I think of the SD3 as a further evolution of SD1.5/2/XL and StableCascade as a branching path. It is unclear which will be better in the long term, so why not cover both directions if they have the resources to do so?
I suspect Stable Cascade may incorporate a DiT at some point. The UNet is easily swapped out. SC’s main innovation is the training of a semantic compressor model and a VQGAN that translates the latent output from the diffusion model back to image space - rather than relying on a VAE.
It’s a really smart architecture and I think is fertile ground for stacking on new things like DiT.