This is obviously a lie. If this was true, all the inference provider companies would go to zero. I support open-source as much as the next guy here, but it's obvious that the local version will be slower or break more often. Like, come on guys. Be real.
To illustrate this, M4 Max chips do 38 TOPS in FP8. An NVIDIA H100 does 4,000 TOPS.
Prakash if you're going to bot our replies, at least make it believable.
I am not Parkash. Just check my profile. I am not a bot. github.com / razem-io. I checked his youtube videos. He seems to lack presentation skills but his app is very usable at its current state.
I have both apps open. The STT seems to be faster with VoiceInk. Like it is instant. I can send you a video if you want.
I am sorry. I did not want your product to look bad. You are right you still need to offload the llm part to openrouter and the like if you want this to be fast too. However, having the ability to switch AI on/off on demand and context aware with custom prompts is perfect.
It can use ollama too. Yes this will be much slower but local. Best off both worlds. No subscription, even if you use cloud ai.
I mean, it's just doing STT, it's not trying to run inference on a frontier model, and if you have to connect to a remote datacenter vs your own machine that could add latency that slows down the experience. There are also tradeoffs in deployment in terms of latency vs throughput that doesn't occur on your local machine that only ever is doing STT on one thing at a time.
I'm skeptical of the claim that local is literally faster, but it's not impossible in the way you suggest.
If you watch the video you'll see that Aqua does more than just STT and why we need the frontier models. I think if you want just STT of what you said verbatim, then yea any local version of Whisper is great. If you want the enhancements shown in the video, you need the frontier models. And doing that locally is slow.
This is obviously a lie. If this was true, all the inference provider companies would go to zero. I support open-source as much as the next guy here, but it's obvious that the local version will be slower or break more often. Like, come on guys. Be real.
To illustrate this, M4 Max chips do 38 TOPS in FP8. An NVIDIA H100 does 4,000 TOPS.
Prakash if you're going to bot our replies, at least make it believable.