Hacker News new | past | comments | ask | show | jobs | submit login

UCB bandit algorithms are quite simple to understand, and the generalisation of them by the UCT algorithm to tree search spaces seems like a pretty clean idea.

I had a go at using UCT to explore a search space earlier this year (not games, but selecting a good sequence of decisions inside a simulation).

The problem I ran into with this approach was that the algorithm has no way to generalise or lift experience from one explored state to other similar / unexplored states, and my search space was large. So having to evaluate each state at least once was prohibitive.

This isn't really a criticism of how well UCT can tackle the problem as framed, but more a criticism of how much better one might be able to do with a bit more structure -- e.g. perhaps with the addition of some kind of similarity metric between states, along with some bounds or assumptions on how the values of "similar" states should relate to each other.

It seems like combining some machine learning / statistical / approximation techniques to approximate the true state->value function with a UCT-like approach to guide exploration could be very effective for the right problem, but getting it to work well might be more of an art than a science.

If anyone has any pointers to reading about this kind of thing, I'd be most grateful! I have previously read about "approximate dynamic programming" techniques, e.g. [1]

[1] http://castlelab.princeton.edu/Papers/Powell-NRLWhat%20you%2...




Any sequence of one of more decisions with varying results is a game, but you probably know that.

Aside from estimating the value of a node from similar nodes, another option is to reduce the search space by putting them in the same information set. This means that to one player they appear indistinguishable, so the same strategy is played in all of them, while the other players might still be able to distinguish them. If your game has a huge state space it's likely that some of the state just hardly matters some of the time to some of the opponents. Other methods of fusing states have the disadvantage that all players have to care about the same information.

To implement MCTS with information sets, use the Information Set MCTS algorithm (ISMCTS) [1], which is a pretty obvious and intuitive extension of MCTS. (Has nobody really described ISMCTS before?) Whitehouse's PhD thesis [2] goes into more detail, and looks at using ISMCTS to reduce the search space in games ("ICARUS"), though I didn't read that chapter. (Earlier this year I mostly-implemented ISMCTS for Cordial Minuet, a game with simultaneous moves in addition to imperfect information. Information sets are unavoidable in simultaneous-move games.)

Also, you will find loads of papers specifically discussing speeding up MCTS for Go, many of which generalise.

[1] http://eprints.whiterose.ac.uk/75048/1/CowlingPowleyWhitehou... [2] http://etheses.whiterose.ac.uk/8117/


Excellent. I did want to eventually write something showing how to modify this algorithm to handle games with randomness, and ISMCTS at a glance looks like it does just that. Thanks!


Actually, you don't need information sets to handle games with chance nodes. Standard MCTS works just fine for those. (Don't use a bandit algorithm at the chance nodes, just pick randomly.) Information sets are only needed if you have uncertainty, either because players make simultaneous moves or a player doesn't know (or forgets) the complete state of the game.


ADP = Reinforcement Learning. This got some attention lately when some people at google wrote a neural-network based RL system to play Atari games.

https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf


(That was before Google acquired DeepMind, and probably a large part of the reason.)

This uses a deep convolutional NN as an approximation function of the expected value. Interestingly there are only 3 hidden layers in this CNN, and it only gets fed the last 4 frames as state. Of course normally you would feed your approximator a direct encoding of the state. If you wanted to use a NN to evaluate nodes in gametree search you would have to invent some other way to train it. Would love to know whether there is research on that.


Take a look at "Rapid Action Value Estimation" or RAVE. http://www.cs.utexas.edu/~pstone/Courses/394Rspring13/resour...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: