Hacker News new | past | comments | ask | show | jobs | submit login

Having worked on Apple's TTS for more than a decade, I can state with confidence that this is utter bullshit and you don't have the slightest idea what you are talking about. Both in terms of quality, and of the underlying technology used, Apple's current TTS is in no way comparable to what existed 10 years ago (at Apple, or anywhere else in the industry).

I challenge you to find a 2014 recording that is on par with a contemporary Siri voice.




I have been playing recently with those enhanced TTS model and they are of similar quality like piper TTS models to me - not that good. StyleTTS 2 like kokoro sounds so much better for me and also run realtime on their devices. And when you compare their online models to not even what OpenAI have but some small recent startups like Sesame or open source models like Orpheus, Apple TTS sounds (pun intended) really behind.


I don't dispute your claim, just that I still find Alex voice to be the best, and it's been the same since over 10 years ago. The other voices have issues, they don't sound too good at 1.5x.


Ah, that's more specific.

Alex was developed when VoiceOver (the screen reader) was the primary use case for text to speech. Consequently, it was optimized for low latency and robustness under rate changes.

The Siri voices sound much more natural at 1x and have a higher signal quality, but rate changes were a lower priority for this use case.

Fun fact: when we worked on Alex, many VoiceOver users stubbornly hung on to Fred (which is mostly using late 1970s technology). Screen reader users are not fond of switching voices; it appears their hearing locks in to a particular voice, so switching is costly.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: