Beruflich Dokumente
Kultur Dokumente
Capture Territory
Why is Go hard for computers to play?
Brute force search intractable:
v (s)
Position
Policy network
Move probabilities
p (a|s)
Position
Exhaustive search
Monte-Carlo rollouts
Reducing depth with value network
Reducing depth with value network
Reducing breadth with policy network
Deep reinforcement learning in AlphaGo
Human expert Supervised Learning Reinforcement Learning Self-play data Value network
positions policy network policy network
Supervised learning of policy networks
Policy network: 12 layer convolutional neural network
Training data: 30M positions from human expert games (KGS 5+ dan)
Results: 57% accuracy on held out test data (state-of-the art was 44%)
Reinforcement learning of policy networks
Policy network: 12 layer convolutional neural network
P prior probability
Q action value
Monte-Carlo tree search in AlphaGo: expansion
p Policy network
P prior probability
Monte-Carlo tree search in AlphaGo: evaluation
v Value network
Monte-Carlo tree search in AlphaGo: rollout
v Value network
r Game scorer
Monte-Carlo tree search in AlphaGo: backup
Q Action value
v Value network
r Game scorer
Deep Blue AlphaGo
Professional
7p
dan (p)
5p
computer opponents
AlphaGo
3000 3p
1p
2500 9d
7d
>75% winning rate with 2000
Amateur
dan (d)
5d
Crazy Stone
Zen
4 stone handicap 1500 3d
Pachi
1d
Fuego
1000 1k
3k
Beginner
kyu (k)
Even stronger using 500
5k
Go
Gnu
distributed machines 0
7k
Professional
7p
with 3 to 4 stone
dan (p)
5p
7d
2000
Amateur
dan (d)
5d
CAUTION: ratings
Crazy Stone
Zen
1500 3d
based on self-
Pachi
1d
Fuego
1000
play results 1k
3k
Beginner
kyu (k)
500
5k
Go
Gnu
7k
0
Evaluating Seoul AlphaGo against humans
Lee Sedol (9p): winner of 18 world titles
Dave Aja Chris Arthur Laurent George Julian Ioannis Veda Yutian
Silver Huang Maddison Guez Sifre Van Den Driessche Schrittwieser Antonoglou Panneershelvam Chen
Marc Sander Dominik John Nal Tim Maddy Koray Thore Demis
Lanctot Dieleman Grewe Nham Kalchbrenner Lillicrap Leach Kavukcuoglu Graepel Hassabis
With thanks to: Lucas Baker, David Szepesvari, Malcolm Reynolds, Ziyu Wang,
Nando De Freitas, Mike Johnson, Ilya Sutskever, Jeff Dean, Mike Marty, Sanjay Ghemawat.
Demis Hassabis