Having worked on Apple's TTS for more than a decade, I can state with confidence...

pzo · 2025-03-21T13:03:49 1742562229

I have been playing recently with those enhanced TTS model and they are of similar quality like piper TTS models to me - not that good. StyleTTS 2 like kokoro sounds so much better for me and also run realtime on their devices. And when you compare their online models to not even what OpenAI have but some small recent startups like Sesame or open source models like Orpheus, Apple TTS sounds (pun intended) really behind.

visarga · 2025-03-22T07:41:50 1742629310

I don't dispute your claim, just that I still find Alex voice to be the best, and it's been the same since over 10 years ago. The other voices have issues, they don't sound too good at 1.5x.

microtherion · 2025-03-24T13:39:53 1742823593

Ah, that's more specific.

Alex was developed when VoiceOver (the screen reader) was the primary use case for text to speech. Consequently, it was optimized for low latency and robustness under rate changes.

The Siri voices sound much more natural at 1x and have a higher signal quality, but rate changes were a lower priority for this use case.

Fun fact: when we worked on Alex, many VoiceOver users stubbornly hung on to Fred (which is mostly using late 1970s technology). Screen reader users are not fond of switching voices; it appears their hearing locks in to a particular voice, so switching is costly.