don't know exactly how it works under the hood for LLM. Gemini provide dedicated API for live API where you stream those audio via WebSocket - I guess they probably use some audio specific tokenizer.
If I would like to guess why amazon is moving to cloud it's:
1) They support only 8 languages right now - cloud LLM or even whisper can support like 50 languages pretty well. I was always dissapointed that couldn't buy Google Mini or Alexa or Apple Home for my mum because none of them speak Polish.
2) They want to provide good support or those less beefy smartspeakers that don't have much power and those still sell well.
3) They wanna move people to this new Alexa subscription that they recently announced or make people more subscribe to Prime.
4) Gather more voice samples so they can train as good multilingual TTS as elevenlabs.
If I would like to guess why amazon is moving to cloud it's:
1) They support only 8 languages right now - cloud LLM or even whisper can support like 50 languages pretty well. I was always dissapointed that couldn't buy Google Mini or Alexa or Apple Home for my mum because none of them speak Polish.
2) They want to provide good support or those less beefy smartspeakers that don't have much power and those still sell well.
3) They wanna move people to this new Alexa subscription that they recently announced or make people more subscribe to Prime.
4) Gather more voice samples so they can train as good multilingual TTS as elevenlabs.