The feature I want is speaker differentiation - I want to feed in an audio file and get back a transcript with "Speaker 1: ..., Speaker 2: ..." indications.
That plus timestamps would be incredible.
The Google Gemini 2.0 models are showing some promise with this, I can't speak to their reliability just yet though.
That plus timestamps would be incredible.
The Google Gemini 2.0 models are showing some promise with this, I can't speak to their reliability just yet though.