I'm very happy to hear this; maybe it's finally time to buy a ton of ram for my ...

lysace · 2024-10-30T20:52:20 1730321540

Getting lots of ram will let you run large models on the CPU, but it will be so slow.

The Apple Silicon Macs have this shared memory between CPU and GPU that let's the (relatively underpowered GPU, compared to a decent Nvidia GPU) run these models at decent speeds, compared with a CPU, when using llama.cpp.

This should all get dramatically better/faster/cheaper within a few years, I suspect. Capitalism will figure this one out.

kulahan · 2024-10-30T20:55:25 1730321725

Interesting, so this is a Mac-specific solution? That's pretty cool.

I assume, then, that the primary goal would be to drop in the beefiest GPU possible when on windows/linux?

evilduck · 2024-10-31T03:24:49 1730345089

There's nothing Mac specific about running LLMs locally, they just happen to be a convenient way to get a ton of VRAM in a single small power efficient package.

In Windows and Linux, yes you'll want at least 12GB of VRAM to have much of any utility but the beefiest consumer GPUs are still topping out at 24GB which is still pretty limiting.

lysace · 2024-10-30T20:58:15 1730321895

With Windows/Linux I think the issue is that NVidia is artificially limiting the amount of onboard RAM (they want to sell those devices for 10x more to openai, etc) and that AMD for whatever reason can't get their shit together.

I'm sure that there are other much more knowledgeable people here though, on this topic.

rustcleaner · 2024-10-31T15:12:26 1730387546

This is why the DMCA must be repealed.