I will be interested to see what kind of algorithms they have used to allow Alph...

Scaevolus · on Oct 18, 2017

They have a new reinforcement learning algorithm that should be generically applicable to anything where a long sequence of moves results in a specifically gradable outcome.

> The neural network in AlphaGo Zero is trained from games of selfplay by a novel reinforcement learning algorithm. In each position s, an MCTS search is executed, guided by the neural network fθ. The MCTS search outputs probabilities π of playing each move. These search probabilities usually select much stronger moves than the raw move probabilities p of the neural network fθ(s); MCTS may therefore be viewed as a powerful policy improvement operator. Self-play with search—using the improved MCTS-based policy to select each move, then using the game winner z as a sample of the value—may be viewed as a powerful policy evaluation operator. The main idea of our reinforcement learning algorithm is to use these search operators repeatedly in a policy iteration procedure: the neural network’s parameters are updated to make the move probabilities and value (p, v)= fθ(s) more closely match the improved search probabilities and selfplay winner (π, z); these new parameters are used in the next iteration of self-play to make the search even stronger.

romaniv · on Oct 18, 2017

> They have a new reinforcement learning algorithm that should be generically applicable to anything where a long sequence of moves results in a specifically gradable outcome.

Statements like these always make me wonder why certain obvious things weren't tried. If it's so generic, why wasn't it tried on Chess? Or was it tried, failed to impress and thus didn't make it into the press release?

This is a big problem with all these public discussion on AI. Almost no one speaks about algorithm failures. I haven't seen a single research paper that said "oh, and we also tried algorithm in X ___domain and it totally sucked".

namelost · on Oct 18, 2017

The conventional wisdom for Chess engines is that aggressive pruning doesn't work well. Chess is much more tactical than Go, selective algorithms tend to lead to some crucial tactic being missed, and the greater the search depth, the more likely that is.

Modern Chess engines are designed to brute-force the search tree as efficiently as possible. I will go out on a limb here and say they would wipe the floor with AlphaGo, because AlphaGo's hardware would be more of a liability than an asset against a CPU.

See also: https://chessprogramming.wikispaces.com/Type+A+Strategy https://chessprogramming.wikispaces.com/Type+B+Strategy

nojvek · on Oct 19, 2017

Until I see AlphaGo zero defeating StockFish 100-0 and with same algorithm defeating best Go AI and killing the Atari games including montezuma’s revenge, I call this hype bullshit.

Give me your results on OpenAI gym in a variety of different styles of games including GTA and WoW. I will believe you if a generic unsupervised algorithm running on a single machine is absolutely destroying the best players.

Until then ...

taneq · on Oct 19, 2017

Just like Lee Se-dol is a Go grandmaster, beats Gary Kasparov at chess and can also get a perfect score in Pac-Man, right? I mean, if you can't do all of those things then are you even a human-level intelligence?

romaniv · on Oct 19, 2017

This just illustrates that surpassing "human level" performance is a silly and arbitrary benchmark, because there is no such thing as general human level performance. But I bet Kasparov would be pretty good at Go, and Sedol would be pretty good at chess.

Universality is the real hard problem of AI. In the long run, a mediocre AI that does a lot of different things is far more useful that most targeted "superhuman" AIs. Most domains simply don't require better-than-human performance, but could still reap tremendous benefits from automation.

taneq · on Oct 20, 2017

Agreed. It's great that we have ___domain-specific approaches that can beat humans in their ___domain (and that we're learning how to make these approaches more generic so that, with re-training, they can adapt to new domains), but the real "oh snap" moment will be when we build something that's barely-adequate but widely adaptable. Something with the adaptability of a corvid or an octopus, say. If we get to that level, it'll mean we've discovered the "universal glue" that joins specialist networks together into a conscious entity.

red75prime · on Oct 19, 2017

You forget to add "running on 20 watts of power". It's not reasonable to require it to run on a single machine, when brain performance is estimated to be more than 10 petaflops.

Tepix · on Oct 19, 2017

I don't know if you're being sarcastic or not. If not, I suggest you look at the cartoon at http://www.kurzweilai.net/robot-learns-self-awareness

goatlover · on Oct 19, 2017

Add Pacman and Pitfall to the list. Humans have played perfect games of both. My understanding is DeepMind performed poorly on those games.

williamichang · on Oct 19, 2017

Doesn't this sound very much like how a human learns to play the game? MCTS ~ play/experience (move probabilities); self-play with search ~ study/analysis (move evaluation); repetition and iteration to build intuition (NN parameters).