Hacker News new | past | comments | ask | show | jobs | submit login

It also presumes that one can simulate the world at low cost. In AlphaGo Zero it takes 0.4 s for 1.600 node extensions, but in this case the cost of the world is negligible. Anyway, assuming you need that many node extensions to get decent quality updates, that puts a rather a tight limit on the cost of simulating the world.



DM has already done a bunch of work on 'deep models' of environments to plan over. Use them and you have 'model-predictive control' and planning, and this tree extension to policy gradients would work as well (probably). It could be pretty interesting to see what would happen if you tried that sort of hybrid on ALE.


I guess deep world models are still severely riddled by all sorts of problems: vanishing gradients, BPTT being O(T), poor generalization ability of NNs (which likely is due to the lack of attractor state associative recall, as well as concept composability), lack of probabilistic message passing to deal with uncertainty, and perhaps some priors about the world are necessary to make learning tractable (such as spatial maps and fine-tuning for time scales that contain interesting information).


What are the main papers from DM on this ? Are you referring to "CONTINUOUS CONTROL WITH DRL" ?!




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: