It also presumes that one can simulate the world at low cost. In AlphaGo Zero it...

gwern · on Oct 19, 2017

DM has already done a bunch of work on 'deep models' of environments to plan over. Use them and you have 'model-predictive control' and planning, and this tree extension to policy gradients would work as well (probably). It could be pretty interesting to see what would happen if you tried that sort of hybrid on ALE.

mannigfaltig · on Oct 19, 2017

I guess deep world models are still severely riddled by all sorts of problems: vanishing gradients, BPTT being O(T), poor generalization ability of NNs (which likely is due to the lack of attractor state associative recall, as well as concept composability), lack of probabilistic message passing to deal with uncertainty, and perhaps some priors about the world are necessary to make learning tractable (such as spatial maps and fine-tuning for time scales that contain interesting information).

disposable_123 · on Oct 19, 2017

What are the main papers from DM on this ? Are you referring to "CONTINUOUS CONTROL WITH DRL" ?!