It seems to me that a full mastery of language requires a grasp of semantics, that is the ability to understand what a sentence means. I doubt it's possible to do that without having basic common sense along with an overall representation of the world, and that looks very close to strong AI, imho.
So I'm not surprised computers keep on struggling with language applications. Once they succeed strong AI will not be much further away.
I think the 'overall representation of the world' requirement is pretty key here. Language in AI is often treated as its own class of problem, with the assumption that there is somehow enough signal in the raw mess of examples provided to any given learning system (usually just plain text, stripped of any prosody, emotion, cultural context, imagery; any of the other modalities of communication available to a demonstrably functioning natural language understander[1]) to build a model that 'understands' general use language. I simply don't see how this is possible[2]. I know the classical philosophies about the complementary nature of language and intelligence are out of fashion right now[3], but I'm not quite convinced they deserve to be.
I'll raise your bet; I'm willing to believe that once we succeed in building a general understanding of language, we'll look back and see that we simultaneously have solved Strong AI. To twist the old saying, I think that language is what the human brain does.
---
[1] Yes, we can talk about P-zombies if you want. But I mean more in the Turing Test sense here.
[2] Yes, I know the progress has been impressive. The progress in the 60s with GOFAI was impressive at first too. Then it plateaued.
[3] I'm particularly referring to Sapir-Whorfishm and the various communication heuristics proposed by Grice. But I'd throw Chomskian Universal Grammar in there too.
Grounding language in other sense modalities (multimodal learning) is a thing. We can even generate captions from images and generate images from captions, albeit, not perfectly.
Another grounding source is related to ontologies. We are already building huge maps of facts about the world like "object1 relation object2".
Another source of "common sense" is word embeddings. In fact it is possible to embed all kinds of things, like, shopping bags, music preferences, networks topologies - as long as we can observe objects in context.
Then there is unsupervised learning from video and images. For example, starting from pictures, cut them in a 3x3 grid, shuffle the tiles and then task the network to recover the original layout. This automatically extract semantic information from images unsupervised. A variant is to take slides from video, shuffle them around, then task the network to recover the original temporal order. Using this process we can cheaply learn about the world and provide this knowledge as "common sense" for NLP tasks.
I am not worried about grounding language. We will get there soon enough, but we're just impatient. Life evolved over billions of years, AI is just emerging now. Imagine how much computing power is in the collected brains of humanity, and how much computer time we give AI to learn. AI is starved of raw computing power and experience yet. Human brains would have done much worse with the same amount of computing.
image caption is a separate, albeit related problem to what I'm talking about.
Ontologies are much the same; they are interesting for the problems they solve, but it's not clear how well those problems relate to the more general problem of language.
word embeddings are also quite interesting, but again, are typically based entirely off whatever emergent semantics can be gleaned from the structure of documents. It's not clear to me that this is anymore than superficial understanding. Not that they aren't very cool and powerful. Distributional semantics is a powerful tool for measuring certain characteristics of language. I'm not sure how much more useful it will be in the future.
Uunsupervised learning from video and images is a strictly different problem that seems to me to be much lower down the hierarchy of AI Hardness. More like a fundamental task that is solvable in its own universe, without requiring complete integration of multiple other universes. Whether the information extracted by these existing technologies is actually usefully semantic in nature remains to be seen.
I agree that we'll get there, somewhat inevitably; not trying to argue for any Searlian dualistic separation between what Machines can do and what Biology can do. I'm personally interested in the 'how'. Emergent Strong AI is the most boring scenario I can imagine; I want to understand the mechanisms at play. It may just be that we need to tie together everything you've listed and more, throw enough data at it, and wait for something approximating intelligence to grow out of it. We can also take the more top-down route, and treat this as a problem in developmental psychology. Are there better ways to learn than just throwing trillions of examples at something until it hits that eureka moment?
I think the key ingredient is to be reinforcement learning, and more importantly, agents being embedded in the external world.
Regarding the "internal world", we already see the development of AI mechanisms for attention, short term memory (references to concepts recently used), episodic memory (autobiographic) and semantic memory (ontologies).
>>I think that language is what the human brain does.
I think language is a UI with our own brain. It allows us to interact with its knowledge system and representation of the world. Self is a thin client running on the vast knowledge system. If you think about it, thinking is not where the real thinking happens. We get intuition signals from the brain on what is true / false , which are required for our higher level thinking. So thinking we do is also a thin client running on top of the Brain OS. Both the thinking and language are serialization tools of representation of the world that was solely evolved for communication with other brains. Since we don't have direct neural link with other brains, we have to serialize it and hence language based thinking.
So i think to evolve language understanding in machines, we might have to simulate many intelligent agents in a simulated environment and let them collaborate. Similar to how our brains collaborated and gave rise to natural languages.
I am inclined to agree with you, but then I remember that people used to say the same thing about chess. Perhaps completely solving language requires strong AI, but maybe we can get 99% there with something like the "chinese room", an AI that works like a well learned parrot.
To grasp semantics of human language I would think that AI would have to an understanding of the world from a human experience. So we would need to simulate human experience for an AI. Anyone know of any work on this?
IMHO It is unproductive to rigidly split problems into "merely requiring algorithmic solution" and "AI-complete".
Even with language there is a whole spectrum of language skill. Some animals like parrots, crows, great apes, and then people can learn language at various levels.
Some deep learning models can already learn basic language skills too. The question is, how far can these techniques go. Maybe, pretty far.
So I'm not surprised computers keep on struggling with language applications. Once they succeed strong AI will not be much further away.