MacBooks with M2 or M3 Max. I’m serious. They perform like a 2070 or 2080 but ha...

ttul · on Feb 22, 2024

MPS is promising and the memory bandwidth is definitely there, but stable diffusion performance on Apple Silicon remains terribly poor compared with consumer Nvidia cards (in my humble opinion). Perhaps this is partly because so many bits of the SD ecosystem are tied to Nvidia primitives.

ummonk · on Feb 23, 2024

Image diffusion models tend to have relatively low memory requirements compared to LLMs (and don’t benefit from batching), so having access to 128 GB of unified memory is kinda pointless.

Filligree · on Feb 23, 2024

They do benefit from batching; up to a 50% performance improvement, in my experience.

That might seem small compared to LLMs, but it isn't small in absolute terms.

ls612 · on Feb 23, 2024

I got a 2x jump on my 4090 from batching SDXL.

ls612 · on Feb 23, 2024

Stable diffusion will run fine on a 3090, or 4070ti Super and higher.

declaredapple · on Feb 22, 2024

How many tokens/s are we talking for a 70B model?

Last I saw they performed really poorly, like lower single digits t/s. Don't get me wrong they're probably a decent value for experimenting with it, but is flat out pathetic compared to an A100 or H100. And I think useless for training?

smcleod · on Feb 22, 2024

You can run a 180B model like Falcon Q4 around 4-5tk/s, a 120B model like Goliath Q4 at around 6-10tk/s, and 70B Q4 around 8-12tk/s and smaller models much quicker, but it really depends on the context size, model architecture and other settings. A A100 or H100 is obviously going to be a lot faster but it costs significantly more taking its supporting requirements into account and can’t be run on a light, battery powered laptop etc…

int_19h · on Feb 23, 2024

For text inference, what you want is M1/M2 Ultra with its 800 Gb/s RAM. Max only goes up to 400 Gb/s.

ls612 · on Feb 23, 2024

Yeah but the ultra only goes in desktop platforms which may be limiting to some.

int_19h · on Feb 23, 2024

But that's no different from mid-to-high-end GPUs, which is what the original ask was about.