> prima.cpp is a distributed implementation of llama.cpp that lets you run 70B-level LLMs on your everyday devices— laptops, desktops, phones, and tablets (GPU or no GPU, it’s all good). With it, you can run QwQ-32B, Qwen 2.5-72B, Llama 3-70B, or DeepSeek R1 70B right from your local home cluster!
Just my two cents from experience, any sufficiently advanced LLM training or inference pipeline eventually figures out that the real bottleneck is the network!
Edit: algorithmed or partitioned in a way that overcomes the network bottleneck