I've now had multiple people ask for this - I will work on adding a new tab for this feature as it is a little different than what the site was originally intended to do.
Generally speaking models seem to be bucketed by param count (3b, 7b, 8b, 14b, 34b, 70b) so for a given VRAM bucket you will end up being able to run 1000's of models - so is it valuable to show 1000s of models?
My bet is "No" - and what really is valuable is like the top 50 trending models on HuggingFace that would fit in your VRAM bucket. So I will try build that.
Would love your thoughts on that though - does that sound like a good idea?
I see your point. I think the solution you mention (top 50 trending models) is as good a solution as I could come up with. Maybe the flow should be: Select a GPU / device -> list all the runnable models, sorted by popularity descending. How you want to operationalize popularity is another question...
It doesn't work for all GPU/device in Simple tab: "Exception: Failed to calculate information for model. Error: Could not extract VRAM from: System Shared".
Cool. What about giving the models for a given GPU? Also it could compare using vLLM, local_llama.c, etc. Links to docs maybe. Community build articles and rating. Along the lines of https://pcpartpicker.com/
And you can definitely add some ref links for a bit of revenue.
- Use natural language for telling offloading requirements.
Do you mean remove the JSON thing and just summarise the offloading requirements?
- Just year of the LLM launch of HF url can help if it’s an outdated LLM or a cutting edge LLM.
Great Idea - I will try add this tonight.
- VLMs/Embedding models are missing?
Yeah I just have text generation models ATM as that is by far where the most interest is. I will look at adding other model types in another type, but wouldn't be until the weekend that I do that.
I've recently been looking into running local LLMs for fun on my laptop (without any GPU) and this is the one thing I've never been able to find consistent information on. This is so helpful, thank you so much! Going to try and run Llama 3.2 3B FP8 soon.
I am using Streamlit, and it is the thing that is adding React.
I appreciate that it's a heavy site, but just being honest with you - it doesn't seem worth the time optimising this by moving to another lighter framework at this stage of the project.
> can you make it detect the device somehow, maybe with some additional permissions, instead of user selecting from a dropdown?
Detecting CPU and GPU specs browser-side is almost impossible to do reliably (even if relying on advanced fingerprinting and certain heuristics).
For GPU’s, it may be possible to use (1) WebGL’s `WEBGL_debug_renderer_info` extension [0][0] or (2) WebGPU’s `GPUAdapter#info` [1][1], but I wouldn’t trust either of those API’s for general usage.
Jay you seem knowledgeable on this - thanks for answering - I have a question
I did look at auto-detecting before, but it seems like you can only really tell the features of a GPU, not so much the good info (VRAM amount and bus speed) - is that the case?
I looked at the GPUAdapter docs, and all it told me was:
- device maker (amd)
- architecture (rdna-3)
and that was it. Is there a way to poke for bus speed and vram amount?
> Is there a way to poke for bus speed and vram amount?
Unfortunately, I’m not aware of any way to reliably get this type of information browser-side.
If you really want to see how far you can get, your best bet would be via fingerprinting, which would require some upfront work using a combination of the manual input you already have now and running some stuff in the background to collect data (especially timing-related data). With enough users manually inputting their specs and enough independently collected data, you’d probably be surprised at how accurate you can get via fingerprinting.
That said, please do NOT go the fingerprinting route, because (1) a lot of users (including myself) hate being fingerprinted (especially if done covertly), (2) you need quite a lot of data before you can do anything useful, and (3) it’s obviously not worth the effort for what you’re building.
Not so much purposefully closed source more that I don't want to make it complex by splitting out the data the app uses from the code (co-ordination problem when it comes to deploying that I don't want to deal with for a project of this size).
https://canirunthisllm.com/