I recently made a little tool for people interested in running local LLMs to fig...

niek_pas · 2025-02-24T12:07:35 1740398855

Feature request: I would like to know if I can run _any_ LLms on my machine, and if so, which.

AJRF · 2025-02-24T15:45:04 1740411904

I've now had multiple people ask for this - I will work on adding a new tab for this feature as it is a little different than what the site was originally intended to do.

Generally speaking models seem to be bucketed by param count (3b, 7b, 8b, 14b, 34b, 70b) so for a given VRAM bucket you will end up being able to run 1000's of models - so is it valuable to show 1000s of models?

My bet is "No" - and what really is valuable is like the top 50 trending models on HuggingFace that would fit in your VRAM bucket. So I will try build that.

Would love your thoughts on that though - does that sound like a good idea?

niek_pas · 2025-03-07T11:30:39 1741347039

I see your point. I think the solution you mention (top 50 trending models) is as good a solution as I could come up with. Maybe the flow should be: Select a GPU / device -> list all the runnable models, sorted by popularity descending. How you want to operationalize popularity is another question...

jakubmazanec · 2025-02-24T09:45:42 1740390342

It doesn't work for all GPU/device in Simple tab: "Exception: Failed to calculate information for model. Error: Could not extract VRAM from: System Shared".

AJRF · 2025-02-24T10:09:18 1740391758

Ah sorry, I will fix that.

alecco · 2025-02-24T09:39:29 1740389969

Cool. What about giving the models for a given GPU? Also it could compare using vLLM, local_llama.c, etc. Links to docs maybe. Community build articles and rating. Along the lines of https://pcpartpicker.com/

And you can definitely add some ref links for a bit of revenue.

chandureddyvari · 2025-02-24T10:02:00 1740391320

Neat idea! It would be helpful to have LLMs ranked from best to worst for a given GPU. Few other improvements I can think of:

- Use natural language for telling offloading requirements.

- Just year of the LLM launch of HF url can help if it’s an outdated LLM or a cutting edge LLM.

- VLMs/Embedding models are missing?

AJRF · 2025-02-24T16:12:03 1740413523

Hey - thanks for the reply.

  - Use natural language for telling offloading requirements.

Do you mean remove the JSON thing and just summarise the offloading requirements?

  - Just year of the LLM launch of HF url can help if it’s an outdated LLM or a cutting edge LLM.

Great Idea - I will try add this tonight.

  - VLMs/Embedding models are missing?

Yeah I just have text generation models ATM as that is by far where the most interest is. I will look at adding other model types in another type, but wouldn't be until the weekend that I do that.

seafoamteal · 2025-02-24T08:11:25 1740384685

I've recently been looking into running local LLMs for fun on my laptop (without any GPU) and this is the one thing I've never been able to find consistent information on. This is so helpful, thank you so much! Going to try and run Llama 3.2 3B FP8 soon.

dockerd · 2025-02-24T19:27:35 1740425255

Looks good.

Feature request - Have a leaderboard of LLM for x/y/z tasks or pull it from existing repo. Suggest the best model for given GPU for x/y/z task.

If there is better model which my GPU can run, why should I go for the lowest?

dockerd · 2025-02-24T19:29:02 1740425342

And maybe provide ollama/lm studio run command for given model/quantization

SomeoneOnTheWeb · 2025-02-24T07:50:01 1740383401

Very nice! Way more complete than the other tools I've seen to estimate running LLMs on GPUs :)

donohoe · 2025-02-24T13:49:14 1740404954

Feature Request: Looks like the React JS is 1.1MB out of the ~ 1.6MB the site takes to load all other assets. Do you really need React for this?

That aside, I think this is really cool and very helpful. Thank you.

AJRF · 2025-02-24T16:07:23 1740413243

I am using Streamlit, and it is the thing that is adding React.

I appreciate that it's a heavy site, but just being honest with you - it doesn't seem worth the time optimising this by moving to another lighter framework at this stage of the project.

Sorry!

donohoe · 2025-02-24T18:38:34 1740422314

No, its all good. And you are right. Its gets the job done.

jatins · 2025-02-24T08:57:59 1740387479

can you make it detect the device somehow, maybe with some additional permissions, instead of user selecting from a dropdown?

jay-barronville · 2025-02-24T14:55:31 1740408931

> can you make it detect the device somehow, maybe with some additional permissions, instead of user selecting from a dropdown?

Detecting CPU and GPU specs browser-side is almost impossible to do reliably (even if relying on advanced fingerprinting and certain heuristics).

For GPU’s, it may be possible to use (1) WebGL’s `WEBGL_debug_renderer_info` extension [0][0] or (2) WebGPU’s `GPUAdapter#info` [1][1], but I wouldn’t trust either of those API’s for general usage.

[0]: https://developer.mozilla.org/en-US/docs/Web/API/WEBGL_debug...

[1]: https://developer.mozilla.org/en-US/docs/Web/API/GPUAdapter/...

AJRF · 2025-02-24T15:52:28 1740412348

Jay you seem knowledgeable on this - thanks for answering - I have a question

I did look at auto-detecting before, but it seems like you can only really tell the features of a GPU, not so much the good info (VRAM amount and bus speed) - is that the case?

I looked at the GPUAdapter docs, and all it told me was:

- device maker (amd)

- architecture (rdna-3)

and that was it. Is there a way to poke for bus speed and vram amount?

jay-barronville · 2025-02-24T20:34:37 1740429277

> Is there a way to poke for bus speed and vram amount?

Unfortunately, I’m not aware of any way to reliably get this type of information browser-side.

If you really want to see how far you can get, your best bet would be via fingerprinting, which would require some upfront work using a combination of the manual input you already have now and running some stuff in the background to collect data (especially timing-related data). With enough users manually inputting their specs and enough independently collected data, you’d probably be surprised at how accurate you can get via fingerprinting.

That said, please do NOT go the fingerprinting route, because (1) a lot of users (including myself) hate being fingerprinted (especially if done covertly), (2) you need quite a lot of data before you can do anything useful, and (3) it’s obviously not worth the effort for what you’re building.

kristopolous · 2025-02-24T15:28:09 1740410889

This looks closed source, am I correct?

AJRF · 2025-02-24T15:46:34 1740411994

Not so much purposefully closed source more that I don't want to make it complex by splitting out the data the app uses from the code (co-ordination problem when it comes to deploying that I don't want to deal with for a project of this size).

When it comes to "how to do the math" this repo was my starting point: https://github.com/Raskoll2/LLMcalc