The policy network is a function from board states to a scoring of moves. The policy network with the greedy heuristic, ie pick the highest rated move with no explicit look ahead method, plays at a high amateur level.
This was... unexpectedly good.
It effectively reduces the branching factor of Go from the number of moves available, to the number of moves actually worth considering.
This was... unexpectedly good.
It effectively reduces the branching factor of Go from the number of moves available, to the number of moves actually worth considering.