Amazon Lex – Build Conversational Voice and Text Interfaces

djyaz1200 · on Nov 30, 2016

Seems like these chat bots have A LOT of lock in. So the question is which one to pick? This one? IBM Watson? Others? Which horse are you betting on and why?

Amazon making this is great because it's developer friendly and will surely fall in price and improve in quality over time. They also have to make it work well because it's a critical ordering channel for them going forward.

Seems like IBM has been focused on this for years though? Worried they will try to make this a profit center though... rather than continually dropping the price.

Other companies working on this I should be aware of?

Thoughts?

olavgg · on Nov 30, 2016

I'm working for a startup called Boost.ai where we have decided to build our own deep learning model and create our own training data. We use several models that predicts what word comes next, take care of mispellings and dialects. We also have memory so it can remember contexts when you ask a new question or it predicts that you want to start with a new context.

If you want something kick ass, I highly recommend building it yourself. The bar set by IBM's Watson, Nuance's Nina or IPSoft's Amelia isn't actually very high. For non english languages, anyone who has some knowledge about NLP and deep learning will easily surpass them.

choxi · on Dec 1, 2016

Any pointers on where to get started? I've messed with Karpathy's RNN example using Shakespeare and PG essays, but don't know where to go from there.

olavgg · on Dec 1, 2016

Karpathy's example tries to predict the next character, there is a fork of it that tries to predict the next word.

https://github.com/larspars/word-rnn

RNN is great approach if you want to play around with text generation with deep learning. But for a chatbot, deep learning alone is not there yet. We create our own intents based on the ___domain and predict the intent of each question. We have also made it looks smarter by creating an intent hierarchy where we try to do multiple predictions for a question with a goal to drill the question down in the intent tree. In that way we know that the question is about a bank card and can figure out if you want a new one, block it, increase credit, set limits and so on.

chiefalchemist · on Dec 1, 2016

Re: "If you want something kick ass, I highly recommend building it yourself."

That might be true, but the ability to spin something up without all that overhead is any MVP'ers wet dream, yes?

ronack · on Nov 30, 2016

You're right, lack of standardization and portability is a major problem. I would not personally choose IBM due to the pricing model and previous bad experiences. Google, Microsoft, and Facebook (and a bunch of startups) have all built or acquired conversational services very much like Lex. None of them is a clear winner at this point in my opinion.

AJ007 · on Nov 30, 2016

I would prefer to log everything on an owned system and test out bots through third-party APIs. Poking around with options now, but will probably just bootstrap with human operators and pre-canned responses until something makes sense.

It's also worth pointing out there is a huge difference between using the bot for a critical piece of your product vs as a supplement to customer support.

chiefalchemist · on Dec 1, 2016

Call me naive/stupid but how is this any different than a backend DB? Can't an interface prevent any sorta lock in?

Qworg · on Nov 30, 2016

Microsoft has their Bot Framework + LUIS. I've found both to be easy to work with.

Disclosure: I work at Microsoft AI and Research.

flukus · on Nov 30, 2016

> Seems like these chat bots have A LOT of lock in. So the question is which one to pick? This one?

None of the above? We had desktop voice to text 20 years ago (that was about on par with google now for me) so I don't see why everything has to go through cloud services.

smiled0g · on Dec 1, 2016

I don't think desktop solutions (like Microsoft speech synthesizer) does a great job at producing "natural" and life-like speeches. Sure, it does read the text, but that's about it.

To properly produce speech, the synthesizer needs to take into account context of around words to determine how to pronounce ambiguous words or how to pace the speech. What about the tone? How about making the speech sounds more engaging? These are the things that need data model to work great. However, it's just difficult to cram that into a desktop app. And why would any company do that when they can put the service online and charge for it (which is totally fair)?

flukus · on Dec 1, 2016

I was thinking more about the other way because that's the hard part, we can understand synthesized voices better than they can understand us.

Apple seems far and away the best desktop solution. For other OS's though, it's frustrating that they haven't improved on what computers of the early eighties could do. Now you've made me nostalgic though, I'll never forget the first time I heard my amiga 500 read my words.

mtthwmtthw · on Nov 30, 2016

Are there any open source project like this? I would imagine it's a machine learning model to match intent and then a custom NER that extracts slots. I'm sure the actual models are pretrained on lots of data, and the process is probably a lot more complex. Given the abundance of vendors in this space, it has to at least be a possibility .

sprobertson · on Dec 1, 2016

For very simple human language to intent parsing (no prompting or context keeping), here's a RNN model in Torch that learns intents and slots based on a bunch of template sentences: https://github.com/spro/intense

zitterbewegung · on Nov 30, 2016

Making the backend of Alexa and offering it to users is a killer feature. I have toyed with the idea of making an app with a voice interface. I was able to make an alexa app and looking at the preview pictures it looks similar.

mtrn · on Nov 30, 2016

> Making the backend of Alexa and offering it to users is a killer feature.

Same scheme that started AWS.

hackcrafter · on Nov 30, 2016

This is neat but I've been on the lookout for a service that takes speech and responds with speech that you can use in a ___domain-specific app.

Basically, Alexa as a service?

Is there something out there that does this?

For certain things that require hands-free usage, this would be a killer feature.

ex. a workout-app that tells you your next rep and weight, and you can respond with what weight/reps you did to add to your log!

appwiz · on Nov 30, 2016

You can add an Amazon Lex bot into a mobile app (Swift, ObjC, Android) using AWS Mobile Hub (https://console.aws.amazon.com/mobilehub/) with a couple of clicks. The apps demonstrate text-to-text and speech-to-speech interaction.

(disclosure: I work for AWS and my team built the Mobile Hub - Lex integration)

ronack · on Nov 30, 2016

Pairing Lex with Polly should get you there.

https://aws.amazon.com/polly/

hackcrafter · on Nov 30, 2016

Cool, thanks!

There is some case study at the bottom of the Lex[0] page that had a disconnected "Polly" thing that I didn't' quite get, but that makes sense.

And it looks like Lex takes voice as input, although I don't know if you can put your app into a "always listening for trigger word" mode for truly hands-free operation or not. Will have to play with it.

[0] https://aws.amazon.com/lex/

spitfire · on Nov 30, 2016

I wish they had a demo of Poly on the site. It's pretty bad to have page after page of text about a voice synthesis service and not have "Click here for demo" above the fold in the first page.

I too, am very interested in a ___domain specific application of lex+___domain knowledge+Poly

OJFord · on Nov 30, 2016

    > It's pretty bad to have page after page of text about a
    > voice synthesis service and not have "Click here for
    > demo" above the fold in the first page.

There's a table of demo clips in male/female voices for a few languages at the bottom of the first page.

squidbot · on Dec 1, 2016

You could pair https://developer.amazon.com/alexa-voice-service with https://developer.amazon.com/alexa-skills-kit

david927 · on Nov 30, 2016

My daughter, a 6th grader, wants to create a robot that can take voice input and act on it. I think this could be a great place to start.

83457 · on Nov 30, 2016

In intro to comp sci we had an open project using an irobot bot (roomba without vaccum) so my group of some of the more experienced students pieced together a robot dog using a laptop and xbox kinect on top. You could say commands and it would obey such as sit (moves back and stops), rollover (spins), bark (sound from speakers), etc. It would also track a primary user and if you said follow it would stay a certain distance near you and follow. This functionality was "easy" with the kinect sdk capabilities as it was just a matter of connecting events from kinect code to commands to send to the robot. If given more time we could have done stuff like hand movements or throwing motions with the kinect skeletal tracking and vectors.

david927 · on Nov 30, 2016

That sounds great! You should post a video of it.

Personally, I was thinking of something cheaper, such as a Raspberry Pi with a stepper motor.

83457 · on Dec 1, 2016

unfortunately I didn't get video (as silly as that sounds) and the person who did never sent to me

I even put my son's furry dog halloween costume on it :)

madeofpalk · on Nov 30, 2016

It's bonkers to believe that this is actually possible for a 6th grader (with maybe a little bit of help) now.

Just over 10 years ago when I was a 6th grader I wanted to do this, but I was out of luck. There were barely any products out that had 'conversational interfaces', let alone the commoditisation to enable just about anyone to do it as a hobby.

flukus · on Nov 30, 2016

I think all the tools were there (though not necessarily the compute power). What was missing for me (20 years ago) was the guidance to find these sorts of possibilities. Parents and teachers were no help and there was no google to find the right geocities pages. I wish I'd known about linux and all the free oss tools much earlier.

wyldfire · on Nov 30, 2016

Are all these Amazon stories from re:Invent? Would it make sense to combine them into a single index today?

gamegoblin · on Nov 30, 2016

Yes, they are all from re:Invent.

In past years, some people on HN have wanted to combine them all, but it's never happened.

I think it makes sense to leave them separate. They are all different products, and if they were released weeks apart they would warrant their own story.

ceejayoz · on Nov 30, 2016

> Would it make sense to combine them into a single index today?

No. You'd have two thousand comments about totally disparate systems to filter though.

wyldfire · on Nov 30, 2016

Yeah, you're right, that would be confusing.

ranman · on Nov 30, 2016

If you guys are curious about these announcements I'll be recapping them and going into more detail on twitch.tv/aws at 12:30 pacific

sparky_ · on Dec 1, 2016

Eh - let them have their metaphorical fifteen minutes of fame. HN treats Apple keynote days, Microsoft Build conference days, etc, with exactly the same... er, enthusiasm :)

jlkjr2 · on Dec 2, 2016

Does anyone have experience with the Web Speech API? https://www.google.com/intl/en/chrome/demos/speech.html What speech engines are behind these? Do they run in the browser? I want to implement speech input commands for an SPA for 3D design. Thanks.

dominotw · on Nov 30, 2016

>you know how simple, useful, and powerful the Alexa-powered interaction model can be.

This is so comical. Alexa is phenomenally bad at conversation, it is so bad that there are almost no successful "apps" built around it despite having an API and app platform.

Alexa is decent at single command/action model nothing more than that.

ghaff · on Nov 30, 2016

In a way, this is a good step toward people writing applications that are good conversationally. Alexa does a good job of voice recognition but I agree that "conversations" are mostly about remembering the right wizard's incantation to make Alexa do something useful. Watching the (very simplified) demo, it's easy to see the seed of writing a useful digital assistant/Star Trek computer there but obviously someone(s) have a whole lot of work to do before we get anywhere near there.

chishaku · on Nov 30, 2016

What are the best alternatives?

sgwealti · on Nov 30, 2016

The "Ok Google/Google Now" voice interactivity (or whatever the official name is) on Android phones is much better than Amazon's Alexa/Echo. It actually remembers context from sentence to sentence so you can say "What is the population of Chicago" and it will answer. Then you can say, "How long would it take me to drive there?" And it understands that you are still talking about Chicago.

Every interaction with Alexa is command, response, and that really limits what you can do with it.

jon-wood · on Nov 30, 2016

There's nothing about Alexa's API that would prevent doing that, it already has the concept of a conversation and I think we'll see raid improvement in how people make use of that.

sgwealti · on Dec 1, 2016

I agree. I'm just surprised that none of the base functions provided by Amazon take advantage of it. Compered to google it feels really basic and limited. I saw this as the owner of two Alexa's who uses them a lot. I'm constantly frustrated by the limitations even though it works well for certain things.

dominotw · on Nov 30, 2016

something that doesn't make me say this really long sentence to access the app

'Alexa tell app X to start my car'

Something that supports actual conversation which means not having the user respond immediately . For example its impossible to build a cooking app that walks me through the recipe because 'conversation' is over within X seconds. Of course you could do weird hacks like maintaining conversation on the server side based on sessionid, but that still requires user to constantly prefix their conversations with 'tell app X ..'

These basic UX limitations make it impossible to build any serious application that can do more than 'joke of the day' type of stuff. All the apps in alexa app store are totally useless 'fart noise of the day' type of garbage.

dastbe · on Nov 30, 2016

I think that's a product vs platform issue, especially having to prefix everything with the application. since you are building your own product, you get to avoid the routing problem entirely.

jp8000 · on Nov 30, 2016

https://www.ibm.com/watson/developercloud/conversation.html is one.

digital_ins · on Dec 1, 2016

This, for travel, for example: https://www.youtube.com/watch?v=s_kw_vlCpcw

Disclosure: I work at the co that builds this

eva1984 · on Nov 30, 2016

This is an open question

ronack · on Nov 30, 2016

It's great to see so much competition in this space. This initial offering looks limited in its ability to handle more complex branching and context building but hopefully it will evolve. I'm also a little surprised at the pricing given that Google and Facebook offer comparable services for free.

sparky_ · on Dec 1, 2016

Ditto on the competition space. In my mind, letting Amazon and Google fight out the epic war of machine learning / intelligence will only yield better APIs for those of us wanting to leverage such technologies.

danso · on Nov 30, 2016

Very nice! With all the great things they've announced today that apply to my work (Lightsail, Rekognition, Athena), I'm crossing my fingers that their next announcement is making speech recognition available as a separate API. And Python 3 support for Lambda.

simonebrunozzi · on Nov 30, 2016

Great presentation by Matt Wood. I've been working with him for years (until 2014) and I think he's a great presenter (among other things).

keyboardsamurai · on Nov 30, 2016

What exactly is a "speech request" in Amazon Lex parlance though? Their pricing is missing a specific timeframe isn't it?

sumsted · on Nov 30, 2016

It's great to see the utterence/intent/slot model to be available outside of Alexa. It's been easy to work with so far in Alexa. And you see a similar model from Nuance with Mix, though Mix seems to be stuck in beta.

Lex looks alot like Alexa though the setup flow is a bit different. Also has prompts for each slot needed. That's nice.

Roritharr · on Nov 30, 2016

Does this work with languages other than English?