I was reading about SyntaxNet (I believe an RNN) developed by Google yesterday. ...

I was reading about SyntaxNet (I believe an RNN) developed by Google yesterday. One interesting problem they've run into is getting the system to properly interpret ambiguities. They use the example sentence "Alice drove down the street in her car":

"The first [possible interpretation] corresponds to the (correct) interpretation where Alice is driving in her car; the second [possible interpretation] corresponds to the (absurd, but possible) interpretation where the street is located in her car. The ambiguity arises because the preposition in can either modify drove or street; this example is an instance of what is called prepositional phrase attachment ambiguity."[1]

One thing I believe helps humans interpret these ambiguities is the ability to form visuals from language. A NN that could potentially interpret/manipulate images and decode language seems like it could help solve the above problem and also be applied to a great deal of other things. I imagine (I know embarrassingly little about NNs) this would also introduce a massive amount of complexity.

[1] https://research.googleblog.com/2016/05/announcing-syntaxnet...