I'm pretty convinced that voice interaction will be the biggest UI change since apps.
Voice is simply natural to humans. Downloading an app to learn about the departure of the next bus is not.
I used voice bots to let my 5-year-old play role-playing games (e.g., checking into a hotel) or let my parents (60+) call a fake car dealership.
It's amazing to observe. They behave as if they're talking to a human, especially when doing it via a phone. That is exactly the UX a computer system should have—simply a phone number and voice.
As soon as people have to learn something new (a new webpage, a new app, etc.), something is wrong.
Voice interaction requires an enclosed area. I find it difficult to use any voice assistants in my life. Other people think I'm talking to them. Perhaps we'll all get single person offices with closing doors.
When you reference enclosed do you mean it needs to be enclosed because TTS is so bad that any background noise throws it off, or do you mean for privacy reasons?
- noise: I expected that this will be solved soon. Eg. LiveKit just announced a VAD model that works on human speech behavior and not voice detection
- privacy: this seems to be a cultural thing. And can quickly change. People moved quickly from everyone on their Bluetooth headset (mid-2000) to calls at all 202x
You’re underestimating how many people are super antisocial, or at least don’t like talking that much! But it’s a fair point — I’d use Siri more if it was reliable
For some it might have to do with anti-social; i'm very social and like talking a lot socially (try to make me shut up); however, for getting stuff done, I find it incredibly time wasting and inefficient. Typing/reading is always faster for me. Like I did a wiring job in a house of someone who only speaks English and their plumber only speaks Spanish, so I call with the plumber in Spanish, he explains what is up; there are at least 20 occasions in that 30 minutes where he drops out or either of us don't hear some part of a sentence so there is repeat. Then I call the English people to explain this to them. If the spanish guy would've sent a whatsapp/signal/whatever, and I would've pulled it through AI and sent it on to the English people, we would've been done in 5 minutes what now took almost 1 hour. But the plumber AND the English people are young and seemingly incapable of reading and really bad at writing. It's not anti-social for me at least; besides sitting in a room for a focused discussion about a feature or so, I cannot imagine how it's not more efficient to do it in writing. Not to mention that I can look/search for it later (but AI does solve that).
I agree that voice control is great, but I feel we’re at an “uncanny valley” moment. You can talk to a machine fluently in natural language, until you suddenly can’t and it makes the dumbest misunderstanding, either from recognition or from parsing.
You still get the best results by talking like a robot.
Voice is simply natural to humans. Downloading an app to learn about the departure of the next bus is not.
I used voice bots to let my 5-year-old play role-playing games (e.g., checking into a hotel) or let my parents (60+) call a fake car dealership.
It's amazing to observe. They behave as if they're talking to a human, especially when doing it via a phone. That is exactly the UX a computer system should have—simply a phone number and voice.
As soon as people have to learn something new (a new webpage, a new app, etc.), something is wrong.