I've found the local models useful for non-coding tasks, however the 8B paramete...

walthamstow · 2025-05-01T12:59:22 1746104362

If you have a 32GB Mac then you should be able to run up to 27B params, I have done so with Google's `gemma3:27b-it-qat`

endlessvoid94 · 2025-05-01T13:27:19 1746106039

Hm, I've got an M2 air w/ 24GB. Running the 27B model was crawling. Maybe I had something misconfigured.

100721 · 2025-05-01T13:53:09 1746107589

No, that sounds right. 24GB isn’t enough to feasibly run 27B parameters. The rule of thumb is approximately 1GB of ram per billion parameters.

Someone in another comment on this post mentioned using one of the micro models (Qwen 0.6B I think?) and having decent results. Maybe you can try that and then progressively move upwards?

EDIT: “Queen” -> “Qwen”

brandall10 · 2025-05-01T15:13:11 1746112391

That rule of thumb is only related to 8 bit quants at low context. The default for ollama is 4 bit, which puts it roughly about 14GB.

The vast majority of people run between 4-6 bit depending on system capability. The extra accuracy above 6 tends to not be worth it relative to the performance hit.

simonw · 2025-05-01T14:00:32 1746108032

You also need to leave space for other apps. If you run a 27B model on a 32GB machine you may find that you can't productively run other apps.

I have 64GB and I can only just fit a bunch of Firefox and VS Code windows at the same time as running a 27B model.

redman25 · 2025-05-01T14:20:25 1746109225

I think only 2/3 of ram is allocated to be available to the gpu, so like 14gb which is probably not enough to run even Q4 quant.

tstrimple · 2025-05-01T22:02:32 1746136952

This is configurable by the way.

sudo sysctl iogpu.wired_limit_mb=12345

hadlock · 2025-05-02T23:38:26 1746229106

deepseek-r1:8b screams on my 12gb gpu. gemma3:12b-it-qat runs just fine, a little faster than I can read. Once you exceed GPU ram it offloads a lot of the model to the CPU and splitting between gpu and cpu is dramatically (80? 95%?) slower

alkh · 2025-05-01T13:31:55 1746106315

How much RAM was it taking during inference?

walthamstow · 2025-05-01T14:24:42 1746109482

15.4GB during inference according to Activity Monitor

alkh · 2025-05-01T15:46:15 1746114375

Oh, nice, that's actually not bad at all. Thanks, will give it a try on my 36Gb Mac