Is this right? The current best TTS from OpenAI uses gpt-4o-audio-preview which is $2.50 input text, $80 output audio, the new gpt-4o-mini-tts is $0.60 input text, $12 output audio. An average 5x price reduction.
Going the other way, transcribe with gpt-4o-audio-preview price was $40 input audio, $10 output text, the new gpt-4o-transcribe is $6 input audio and $10 output text. Like a 7x reduction on the input price.
TTS/Transcribe with gpt-4o-audio-preview was a hack where you had to prompt with 'listen/speak this sentence:' and it often got it wrong. These new dedicated models are exactly what we needed.
I'm currently using the Google TTS API which is really good, fast and cheap. They charges $16 per million characters which is exactly the same as OpenAI's $0.015 per minute estimate.
Unfortunately it's not really worth switching over if the costs are exactly the same. Transcription on the other hand is 1.6¢/minute with Google and 0.6¢/minute with OpenAI now, that might be worth switching over for.
Previous offering from OpenAI was $15 for TTS and $30 for TTS HD so not 5x reduction. This one is slighly cheaper but definitely more capable (if you need control vibe)
That's a really cool page thanks. Does it have stats for other languages?
In my experience the OpenAI TTS APIs were really bad, messing up all the time in foreign languages. Practically unusable for my use case. You'd have to use the gpt-4o-audio-preview to get anything close to passable, but it was expensive. Which is why I'm using Google TTS which is very fast, high quality, and provides first class support for almost every language.
I look forward to comparing it with this model, the price being the same is unfortunate as there's less incentive to switch. The transcribe price is cheaper than Google it looks like so that's worth considering.
Depends on what's available for the language, but yea Wavenet and Neural2. With OpenAI TTS I'd often get weird bugs where the first API call comes back all garbled, but the second API call comes back fine. Wasting money. On top of that more expensive and higher latency. I'm interested to try out this new one.
Going the other way, transcribe with gpt-4o-audio-preview price was $40 input audio, $10 output text, the new gpt-4o-transcribe is $6 input audio and $10 output text. Like a 7x reduction on the input price.
TTS/Transcribe with gpt-4o-audio-preview was a hack where you had to prompt with 'listen/speak this sentence:' and it often got it wrong. These new dedicated models are exactly what we needed.
I'm currently using the Google TTS API which is really good, fast and cheap. They charges $16 per million characters which is exactly the same as OpenAI's $0.015 per minute estimate.
Unfortunately it's not really worth switching over if the costs are exactly the same. Transcription on the other hand is 1.6¢/minute with Google and 0.6¢/minute with OpenAI now, that might be worth switching over for.