Universal Speech Model (USM): State-of-the-art speech AI for 100 languages

abudabi123 · on March 7, 2023

How many parameters and parameter-links are needed to model thinking? I've seen one number, 88B plus for modeling the brain's neurons. And there is that blue brain project. An author commented in an interview on her book that language is the skin of the mind. When IBM was using the Think-X branding before selling off bits of itself, what was the IBM culture/tradition saying to itself on modeling thinking?

VHRanger · on March 6, 2023

I'm surprised a state of the art speech-to-text model has 2B parameters (this) when a language model has >50B (llama, GPT3, chinchilla, etc.)

fwlr · on March 7, 2023

Speech-to-text is transcriptive, it spends its parameters on encoding a mapping from one ___domain to another. Language models are generative, they spend their parameters on trying to encode a ___domain. It might seem like speech-to-text should need more because it cares about two domains (“speech” and “text”) while language models only care about one ___domain (“text”), but it turns out mappings are much smaller than domains.

lern_too_spel · on March 7, 2023

A ten year old can transcribe arbitrarily complicated speech in their native tongue but could have difficulty summarizing it or generating it. There are more tasks to unlock with larger and larger language models, but the language model for figuring out the most likely utterance given the audio doesn't have to be nearly so big, just like a reasonable language model for summarizing can be much smaller than one for dialogue.

deskamess · on March 6, 2023

What are those 100 languages?

alephxyz · on March 7, 2023

This is the 102 language dataset they used for evaluation: https://huggingface.co/datasets/google/xtreme_s