Seems like these chat bots have A LOT of lock in. So the question is which one to pick? This one? IBM Watson? Others? Which horse are you betting on and why?
Amazon making this is great because it's developer friendly and will surely fall in price and improve in quality over time. They also have to make it work well because it's a critical ordering channel for them going forward.
Seems like IBM has been focused on this for years though? Worried they will try to make this a profit center though... rather than continually dropping the price.
Other companies working on this I should be aware of?
I'm working for a startup called Boost.ai where we have decided to build our own deep learning model and create our own training data. We use several models that predicts what word comes next, take care of mispellings and dialects. We also have memory so it can remember contexts when you ask a new question or it predicts that you want to start with a new context.
If you want something kick ass, I highly recommend building it yourself. The bar set by IBM's Watson, Nuance's Nina or IPSoft's Amelia isn't actually very high. For non english languages, anyone who has some knowledge about NLP and deep learning will easily surpass them.
RNN is great approach if you want to play around with text generation with deep learning. But for a chatbot, deep learning alone is not there yet. We create our own intents based on the ___domain and predict the intent of each question. We have also made it looks smarter by creating an intent hierarchy where we try to do multiple predictions for a question with a goal to drill the question down in the intent tree. In that way we know that the question is about a bank card and can figure out if you want a new one, block it, increase credit, set limits and so on.
You're right, lack of standardization and portability is a major problem. I would not personally choose IBM due to the pricing model and previous bad experiences. Google, Microsoft, and Facebook (and a bunch of startups) have all built or acquired conversational services very much like Lex. None of them is a clear winner at this point in my opinion.
I would prefer to log everything on an owned system and test out bots through third-party APIs. Poking around with options now, but will probably just bootstrap with human operators and pre-canned responses until something makes sense.
It's also worth pointing out there is a huge difference between using the bot for a critical piece of your product vs as a supplement to customer support.
> Seems like these chat bots have A LOT of lock in. So the question is which one to pick? This one?
None of the above? We had desktop voice to text 20 years ago (that was about on par with google now for me) so I don't see why everything has to go through cloud services.
I don't think desktop solutions (like Microsoft speech synthesizer) does a great job at producing "natural" and life-like speeches. Sure, it does read the text, but that's about it.
To properly produce speech, the synthesizer needs to take into account context of around words to determine how to pronounce ambiguous words or how to pace the speech. What about the tone? How about making the speech sounds more engaging? These are the things that need data model to work great. However, it's just difficult to cram that into a desktop app. And why would any company do that when they can put the service online and charge for it (which is totally fair)?
I was thinking more about the other way because that's the hard part, we can understand synthesized voices better than they can understand us.
Apple seems far and away the best desktop solution. For other OS's though, it's frustrating that they haven't improved on what computers of the early eighties could do. Now you've made me nostalgic though, I'll never forget the first time I heard my amiga 500 read my words.
Are there any open source project like this? I would imagine it's a machine learning model to match intent and then a custom NER that extracts slots. I'm sure the actual models are pretrained on lots of data, and the process is probably a lot more complex. Given the abundance of vendors in this space, it has to at least be a possibility .
For very simple human language to intent parsing (no prompting or context keeping), here's a RNN model in Torch that learns intents and slots based on a bunch of template sentences: https://github.com/spro/intense
Making the backend of Alexa and offering it to users is a killer feature. I have toyed with the idea of making an app with a voice interface. I was able to make an alexa app and looking at the preview pictures it looks similar.
You can add an Amazon Lex bot into a mobile app (Swift, ObjC, Android) using AWS Mobile Hub (https://console.aws.amazon.com/mobilehub/) with a couple of clicks. The apps demonstrate text-to-text and speech-to-speech interaction.
(disclosure: I work for AWS and my team built the Mobile Hub - Lex integration)
There is some case study at the bottom of the Lex[0] page that had a disconnected "Polly" thing that I didn't' quite get, but that makes sense.
And it looks like Lex takes voice as input, although I don't know if you can put your app into a "always listening for trigger word" mode for truly hands-free operation or not. Will have to play with it.
I wish they had a demo of Poly on the site. It's pretty bad to have page after page of text about a voice synthesis service and not have "Click here for demo" above the fold in the first page.
I too, am very interested in a ___domain specific application of lex+___domain knowledge+Poly
> It's pretty bad to have page after page of text about a
> voice synthesis service and not have "Click here for
> demo" above the fold in the first page.
There's a table of demo clips in male/female voices for a few languages at the bottom of the first page.
In intro to comp sci we had an open project using an irobot bot (roomba without vaccum) so my group of some of the more experienced students pieced together a robot dog using a laptop and xbox kinect on top. You could say commands and it would obey such as sit (moves back and stops), rollover (spins), bark (sound from speakers), etc. It would also track a primary user and if you said follow it would stay a certain distance near you and follow. This functionality was "easy" with the kinect sdk capabilities as it was just a matter of connecting events from kinect code to commands to send to the robot. If given more time we could have done stuff like hand movements or throwing motions with the kinect skeletal tracking and vectors.
It's bonkers to believe that this is actually possible for a 6th grader (with maybe a little bit of help) now.
Just over 10 years ago when I was a 6th grader I wanted to do this, but I was out of luck. There were barely any products out that had 'conversational interfaces', let alone the commoditisation to enable just about anyone to do it as a hobby.
I think all the tools were there (though not necessarily the compute power). What was missing for me (20 years ago) was the guidance to find these sorts of possibilities. Parents and teachers were no help and there was no google to find the right geocities pages. I wish I'd known about linux and all the free oss tools much earlier.
In past years, some people on HN have wanted to combine them all, but it's never happened.
I think it makes sense to leave them separate. They are all different products, and if they were released weeks apart they would warrant their own story.
Eh - let them have their metaphorical fifteen minutes of fame. HN treats Apple keynote days, Microsoft Build conference days, etc, with exactly the same... er, enthusiasm :)
Does anyone have experience with the Web Speech API? https://www.google.com/intl/en/chrome/demos/speech.html
What speech engines are behind these? Do they run in the browser?
I want to implement speech input commands for an SPA for 3D design. Thanks.
>you know how simple, useful, and powerful the Alexa-powered interaction model can be.
This is so comical. Alexa is phenomenally bad at conversation, it is so bad that there are almost no successful "apps" built around it despite having an API and app platform.
Alexa is decent at single command/action model nothing more than that.
In a way, this is a good step toward people writing applications that are good conversationally. Alexa does a good job of voice recognition but I agree that "conversations" are mostly about remembering the right wizard's incantation to make Alexa do something useful. Watching the (very simplified) demo, it's easy to see the seed of writing a useful digital assistant/Star Trek computer there but obviously someone(s) have a whole lot of work to do before we get anywhere near there.
The "Ok Google/Google Now" voice interactivity (or whatever the official name is) on Android phones is much better than Amazon's Alexa/Echo. It actually remembers context from sentence to sentence so you can say "What is the population of Chicago" and it will answer. Then you can say, "How long would it take me to drive there?" And it understands that you are still talking about Chicago.
Every interaction with Alexa is command, response, and that really limits what you can do with it.
There's nothing about Alexa's API that would prevent doing that, it already has the concept of a conversation and I think we'll see raid improvement in how people make use of that.
I agree. I'm just surprised that none of the base functions provided by Amazon take advantage of it. Compered to google it feels really basic and limited. I saw this as the owner of two Alexa's who uses them a lot. I'm constantly frustrated by the limitations even though it works well for certain things.
something that doesn't make me say this really long sentence to access the app
'Alexa tell app X to start my car'
Something that supports actual conversation which means not having the user respond immediately . For example its impossible to build a cooking app that walks me through the recipe because 'conversation' is over within X seconds. Of course you could do weird hacks like maintaining conversation on the server side based on sessionid, but that still requires user to constantly prefix their conversations with 'tell app X ..'
These basic UX limitations make it impossible to build any serious application that can do more than 'joke of the day' type of stuff. All the apps in alexa app store are totally useless 'fart noise of the day' type of garbage.
I think that's a product vs platform issue, especially having to prefix everything with the application. since you are building your own product, you get to avoid the routing problem entirely.
It's great to see so much competition in this space. This initial offering looks limited in its ability to handle more complex branching and context building but hopefully it will evolve. I'm also a little surprised at the pricing given that Google and Facebook offer comparable services for free.
Ditto on the competition space. In my mind, letting Amazon and Google fight out the epic war of machine learning / intelligence will only yield better APIs for those of us wanting to leverage such technologies.
Very nice! With all the great things they've announced today that apply to my work (Lightsail, Rekognition, Athena), I'm crossing my fingers that their next announcement is making speech recognition available as a separate API. And Python 3 support for Lambda.
It's great to see the utterence/intent/slot model to be available outside of Alexa. It's been easy to work with so far in Alexa. And you see a similar model from Nuance with Mix, though Mix seems to be stuck in beta.
Lex looks alot like Alexa though the setup flow is a bit different. Also has prompts for each slot needed. That's nice.
Amazon making this is great because it's developer friendly and will surely fall in price and improve in quality over time. They also have to make it work well because it's a critical ordering channel for them going forward.
Seems like IBM has been focused on this for years though? Worried they will try to make this a profit center though... rather than continually dropping the price.
Other companies working on this I should be aware of?
Thoughts?