Stability has to make money somehow. By releasing an 8B parameter model, they’re encouraging people to use their paid API for inference. It’s not a terrible business decision. And hobbyists can play with the smaller models, which with some refining will probably be just fine for most non-professional use cases.
Oh they’ll never let you pay for porn generation. But they will happily entertain having you pay for quality commercial images that are basically a replacement for the entire graphic design industry.
Don't people quantize SD down to 8 bits? I understand plenty of people don't have 8GB of VRAM (and I suppose you need some extra for supplemental data, so maybe 10GB?). But that's still well within the realm of consumer hardware capabilities.
I am going to look at quantization for 8b. But also, these are transformers, so variety of merging / Frankenstein-tune is possible. For example, you can use 8b model to populate the KV cache (which computes once, so can load from slower devices, such as RAM / SSD) and use 800M model for diffusion by replicating weights to match layers of the 8b model.
Do you know how the memory demands compare to LLMs at the same number of parameters? For example, Mistral 7B quantized to 4 bits works very well on an 8GB card, though there isn’t room for long context.
Very interesting. I've been streching my 12GB 3060 as far as I can; it's exciting that smaller hardware is still usable even with modern improvements.