Hacker News new | past | comments | ask | show | jobs | submit login

I'm already getting 1.5tok on Ubuntu running on Android via UserLand w/ Llama.cpp(v2-Q4). Don't really see acceleration. If anything I need to see my phone do something actually useful at let's say 7-10toks



Human speech is in the 2-4 tokens per second range, I think that's about where my frustration limit is.


mlc should already be pretty fast on Vulkan




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: