Hacker News new | past | comments | ask | show | jobs | submit login

So, they just announced StableCascade.

Wouldn't this v3 supersede the StableCascade work?

Did they announce it because a team had been working on it and they wanted to push it out to not just lose it as an internal project, or are there architectural differences that make both worthwile?




I think of the SD3 as a further evolution of SD1.5/2/XL and StableCascade as a branching path. It is unclear which will be better in the long term, so why not cover both directions if they have the resources to do so?


I suspect Stable Cascade may incorporate a DiT at some point. The UNet is easily swapped out. SC’s main innovation is the training of a semantic compressor model and a VQGAN that translates the latent output from the diffusion model back to image space - rather than relying on a VAE.

It’s a really smart architecture and I think is fertile ground for stacking on new things like DiT.


There's architectural differences, although I found Stable Cascade a bit underwhelming, while yes it can actual manage text, the text it does manage just looks like someone just wrote text over the image it doesn't feel integrated a lot of the time.

SD3 seems to be more towards SOTA, not sure why Cascade took so long to get out, seemed to be up and running months ago


Stable Cascade has a distinct noisy look to generated images. It almost looks as bad as images being dithered to the old 216 color Netscape palette.


If you renoise the output of the first diffusion stage to halfway and then denoise forward again, you can eliminate the bad output. This approach is called “replay” or “iterative mixing” and there are a few open source nodes for ComfyUI you can refer to.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: