Show HN: Can I run this LLM? (locally)

codingdave · 2025-03-08T23:20:25 1741476025

And herein lies the problem with vibe coding - accuracy is wanting.

I can absolutely run models that this site says cannot be run. Shared RAM is a thing - even with limited VRAM, shared RAM can compensate to run larger models. (Slowly, admittedly, but they work.)

lucb1e · 2025-03-09T00:26:38 1741479998

New word for me: vibe coding

> coined the term in February 2025

> Vibe coding is a new coding style [...] A programmer can describe a program in words and get an AI tool to generate working code, without requiring an understanding of the code. [...] [The programmer] surrenders to the "vibes" of the AI [without reading the resulting code.] When errors arise, he simply copies them into the system without further explanation.

https://en.wikipedia.org/wiki/Vibe_coding

thaumasiotes · 2025-03-09T01:11:55 1741482715

Austen Allred sold a group of investors on the idea that this was the future of everything.

https://www.gauntletai.com/

avereveard · 2025-03-09T00:04:54 1741478694

Also quantization and allocation strategies are a big thing for local usage. 16gb vram don't seem a lot, but you can run recent 32b model in IQ3 with their full 128k context if you allocate the kv matrix on system memory, with 15t/s and a decent prompt processing speed (just above 1000t/s on my hardware)

asasidh · 2025-03-08T23:30:12 1741476612

yes I agree that you can run. I have personally run Ollama on a 2020 intel macbook pro. Its not a problem of vibe coding, but of the choice of logic i went with.

asasidh · 2025-03-08T23:32:00 1741476720

thanks for your feedback, there is room to show how fast or slow the model will run. I will try to update the app

abujazar · 2025-03-09T00:52:09 1741481529

Nice concept – but unfortunstely I found it to be incorrect in all of the examples I tried with my Mac.

It'd also need to be much more precise in hardware specs and cover a lot more models and their variants to be actually useful.

Grading the compatibilty is also an absolute requirement – it's rarely an absolute yes or no, but often a question of available GPU memory. There's a lot of other factors too which don't seem to be considered.

rkagerer · 2025-03-09T00:58:34 1741481914

I found it to be incorrect in all of the examples I tried

Are you sure it's not powered by an LLM inside?

abujazar · 2025-03-09T01:18:47 1741483127

I believe it'd be more precise if it used an appropriately chosen and applied LLM in combination with web research – in contrast to juggling together some LLM generated code.

ggerules · 2025-03-09T03:47:52 1741492072

Confirmed. Nice idea but it doesn't really define"run". I can run some relatively large models compared to their choices. They just happen to be slow.

do_not_redeem · 2025-03-09T00:35:00 1741480500

> Can I Run DeepSeek R1

> Yes, you can run this model! Your system has sufficient resources (16GB RAM, 12GB VRAM) to run the smaller distilled version (likely 7B parameters or less) of this model.

Last I checked DeepSeek R1 was a 671B model, not a 7B model. Was this site made with AI?

jsheard · 2025-03-09T00:49:07 1741481347

> Was this site made with AI?

OP said they "vibe coded" it, so yes.

https://en.m.wikipedia.org/wiki/Vibe_coding

kennysoona · 2025-03-09T03:56:07 1741492567

Goodness. I love getting older and see the ridiculousness of the next generation.

reaperman · 2025-03-09T00:41:23 1741480883

It says “smaller distilled model” in your own quote which, generously, also implies quantized.

Here[0] are some 1.5B and 8B distilled+quantized derivatives of DeepSeek. However, I don’t find a 7B model, that seems totally made-up from whole cloth. Also, I personally wouldn’t call this 8B model “DeepSeek”.

0: https://www.reddit.com/r/LocalLLaMA/comments/1iskrsp/quantiz...

sudohackthenews · 2025-03-09T00:37:41 1741480661

> > smaller distilled version

Not technically the full R1 model, it’s talking about the distillations where Deepseek trained Qwen and Llama models based on R1 output

do_not_redeem · 2025-03-09T00:42:41 1741480961

Then how about DeepSeek R1 GGUF:

> Yes, you can run this model! Your system has sufficient resources (16GB RAM, 12GB VRAM) to run this model.

No mention of distillations. This was definitely either made by AI, or someone picking numbers for the models totally at random.

sudohackthenews · 2025-03-09T01:17:47 1741483067

Ok yeah that’s just weird

wbakst · 2025-03-09T01:51:00 1741485060

lol words out of my mouth

monocasa · 2025-03-09T00:52:30 1741481550

Is it maybe because DeepSeek is a MoE and doesn't require all parameters for a given token?

That's not ideal from a token throughput perspective, but I can see min working set of weight memory gains if you can load pieces into vram for each token.

throwaway314155 · 2025-03-09T00:58:56 1741481936

It still wouldn't fit in 16 GB memory. Further there's too much swapping going on with MoE models to move expert layers to and from gpu without bottlenecks.

drodgers · 2025-03-09T00:17:46 1741479466

This doesn't mention quantisations. Also, it says I can run R1 with 128GB of ram, but even the 1.58 bit quantisation takes 160GB.

lukev · 2025-03-09T00:25:44 1741479944

This just isn’t right. It says I can run a 400+ parameter model on my M4 128gb. This is false, even at high quantization.

CharlesW · 2025-03-09T00:42:21 1741480941

> One of the most frequent questions one faces while running LLMs locally is: I have xx RAM and yy GPU, Can I run zz LLM model ?

In my experience, LM Studio does a pretty great job of making this a non-issue. Also, whatever heuristics this site is based on are incorrect — I'm running models on a 64GB Mac Studio M1 Max that it claims I can't.

mentalgear · 2025-03-09T00:07:40 1741478860

How exactly does the tool check? Not sure it's that useful since simply estimating via the parameter count is a pretty good proxy, then using ollama to dl a model for testing works out pretty nicely.

scwilbanks · 2025-03-09T00:02:42 1741478562

I think I would like if it also provided benchmarks. The question I have is less can I run this model, but what is the most performant (on some metric) model I can run on my current system?

lucb1e · 2025-03-09T00:12:47 1741479167

- When you press the refresh button, it loads data from huggingface.co/api, doing the same request seemingly 122 times within one second or so

- When I select "no dedicated GPU" because mine isn't listed, it'll just answer the same "you need more (V)RAM" for everything I click. It might as well color those models red in the list already, or at minimum show the result without having to click "Check" after selecting everything. The UX flow isn't great

- I have 24GB RAM (8GB fixed soldered, extended with 1x16GB SO-DIMM), but that's not an option to select. Instead of using a dropdown for a number, maybe make it a numeric input field, optionally with a slider like <input type=range min=1 max=128 step=2>, or mention whether to round up or down when one has an in-between value (I presume down? I'm not into this yet, that's why I'm here / why this site sounded useful)

- I'm wondering if this website can just be a table with like three columns (model name, minimum RAM, minimum VRAM). To answer my own question, I tried checking the source code but it's obfuscated with no source map available, so not sure if this suggestion would work

- Edit2: while the tab is open, one CPU core is at 100%. That's impressive, browsers are supposed to not let a page fire code more than once per second when the tab is not in the foreground, and if it were an infinite loop then the page would hang. WTF is this doing? When I break the debugger at a random moment, it's in scheduler.production.min.js according to the comment above the place where it drops me </edit2>.

Edit: thinking about this again...

what if you flip the whole concept?

1. Put in your specs

2. It shows a list of models you can run

The list could be sorted descending by size (presuming that loosely corresponds to best quality, per my lay person understanding). At the bottom, it could show a list of models that the website is aware of but that your hardware can't run

asasidh · 2025-03-09T00:16:12 1741479372

thanks for the feedback, will check and update if there are any bugs causing multiple calls

lucb1e · 2025-03-09T00:22:53 1741479773

In case it's relevant, I'm using Firefox

thanhhaimai · 2025-03-09T00:55:13 1741481713

This is where I hope HN has a downvote option. This is not erroneous to the point I want to flag, but the quality is low enough that I want to counteract the upvotes. This is akin to spam in my opinion.

paulirish · 2025-03-09T03:05:43 1741489543

UX whine: Why do I have to click "Check compatibility"? After type and RAM, you instantly know all the models. Just list the compatible ones!

kennysoona · 2025-03-09T03:57:10 1741492630

Are people really complaining about having to click a button now? You really expect dynamic nodejs type cruft by default?

paulirish · 2025-03-09T18:55:04 1741546504

Let's add a "read comment" button to HN to make this discussion easier.

kennysoona · 2025-03-09T19:17:04 1741547824

Text as part of the page is different from text generated from a query based on user input.

jdboyd · 2025-03-09T00:12:05 1741479125

It doesn't have any of my Nvidia GPUs, nor my AMD GPUs in the list and then always tells me I need more VRAM since I can't select a GPU.

asasidh · 2025-03-09T00:12:49 1741479169

thanks for the feedback I need to refresh the list or load dynamically.

flemhans · 2025-03-09T00:11:16 1741479076

128 GB of cpu memory seems to be a bit of a lower upper limit. Maybe it could be increased?

Also, my Mac with 36 GB of memory can't be selected.

asasidh · 2025-03-09T00:14:00 1741479240

I have Mac going upto 512 GB of memory since the latest mac studio lauched last week has support for 512 GB of unified memory.

locusofself · 2025-03-09T00:05:28 1741478728

Neat, I'd prefer to it just show what models I can run though, rather than saying if I can or cannot run a specific one.

asasidh · 2025-03-09T00:14:20 1741479260

thanks for the feedback thats a good idea

drdaeman · 2025-03-09T00:04:24 1741478664

You forgot to define what constitutes “running”. And people have different expectations.

asasidh · 2025-03-09T00:15:20 1741479320

agree, the model assumes a multitasking setup where you need some leftover ram for other tasks. You can squeeze in much larger models when running dedicated

drdaeman · 2025-03-09T00:35:30 1741480530

It would be a lot nicer if it would not just give a binary flag "can/can't run" but what to expect.

Ideal scenario (YMMV): add more hardware parameters (like chipset, CPU, actual RAM type/timings - with presets for most common setups) and extra model settings (quantization and context size come to mind) then answer like this: "you have sufficient RAM to load the model, and you should expect performance around 10 tok/sec with 3s to the first token". Or maybe rather list all models you know about and provide performance for each. Inverse search ("what rig do I need to run this model with at least this performance") would be also very cool. May be nice have an ability to parse input of common system information tools (like Windows wmic/Get-ComputerInfo, macOS system_profiler or GNU/Linux dmidecode - not sure if all info is there, but just as an rough idea: give some commands to run, parse their output in search of specs)

Of course, this would be very non-trivial to implement and you'll probably have to dig a lot for anecdotal data on how various hardware performs (hmm... maybe a good task for agentic LLM?) but that would actually make this a serious tool that people can use and link to, rather than a toy.

grigio · 2025-03-09T12:22:35 1741522955

Mhmm.. AMD APU do not have a GPU but can run up to 14B models quite fast

instagary · 2025-03-09T00:57:06 1741481826

Cool idea. iPhone would be great too! & Function calling tools.

noobcoder · 2025-03-09T01:45:54 1741484754

I have ran Deepseek R1 on my PC with 128 gigs ram Effortlessly

cultureulterior · 2025-03-09T00:30:10 1741480210

Great, but it should have image and video models too.

karmakaze · 2025-03-09T02:07:27 1741486047

AI generated app to generate AI generated trash?

blebo · 2025-03-09T00:24:48 1741479888

I have found the stoplight chart on https://www.canirunthisllm.net/ to be the most useful one of these types of calculators.

mrwhitehat · 2025-03-09T13:50:12 1741528212

even add quantized models

throwaway314155 · 2025-03-09T01:20:42 1741483242

OP, did you even do a simple smoke test of all the options before? None of this works well.

fdafds · 2025-03-08T23:54:32 1741478072

[flagged]

asasidh · 2025-03-08T23:55:45 1741478145

no, I dont see how your post is related. maybe I am just arguing with a bot ?

ceejayoz · 2025-03-08T23:58:54 1741478334

You are. There are a couple accounts spamming this link today.