Hacker News new | past | comments | ask | show | jobs | submit login
Universal Speech Model (USM): State-of-the-art speech AI for 100 languages (googleblog.com)
40 points by shantanu_sharma on March 6, 2023 | hide | past | favorite | 6 comments



How many parameters and parameter-links are needed to model thinking? I've seen one number, 88B plus for modeling the brain's neurons. And there is that blue brain project. An author commented in an interview on her book that language is the skin of the mind. When IBM was using the Think-X branding before selling off bits of itself, what was the IBM culture/tradition saying to itself on modeling thinking?


I'm surprised a state of the art speech-to-text model has 2B parameters (this) when a language model has >50B (llama, GPT3, chinchilla, etc.)


Speech-to-text is transcriptive, it spends its parameters on encoding a mapping from one ___domain to another. Language models are generative, they spend their parameters on trying to encode a ___domain. It might seem like speech-to-text should need more because it cares about two domains (“speech” and “text”) while language models only care about one ___domain (“text”), but it turns out mappings are much smaller than domains.


A ten year old can transcribe arbitrarily complicated speech in their native tongue but could have difficulty summarizing it or generating it. There are more tasks to unlock with larger and larger language models, but the language model for figuring out the most likely utterance given the audio doesn't have to be nearly so big, just like a reasonable language model for summarizing can be much smaller than one for dialogue.


What are those 100 languages?


This is the 102 language dataset they used for evaluation: https://huggingface.co/datasets/google/xtreme_s




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: