I'm already getting 1.5tok on Ubuntu running on Android via UserLand w/ Llama.cp... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

villgax on Aug 15, 2023 | parent | context | favorite | on: GPU-Accelerated LLM on an Orange Pi

I'm already getting 1.5tok on Ubuntu running on Android via UserLand w/ Llama.cpp(v2-Q4). Don't really see acceleration. If anything I need to see my phone do something actually useful at let's say 7-10toks

regularfry on Aug 15, 2023 | [–]

Human speech is in the 2-4 tokens per second range, I think that's about where my frustration limit is.

brucethemoose2 on Aug 15, 2023 | [–]

mlc should already be pretty fast on Vulkan

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact