Sie sind auf Seite 1von 9

University „Džemal Bijedić“ of Mostar

Faculty of Information Technology

Seminar work in the field of artificial intelligence

ALPHAZERO

Professor: prof. dr. Edina Špago-Ćumurija Student: Alem Tatarević, IB160042

Mostar, June 2018


Contents

1. Intro ................................................................................................................. 3
1.1. About DeepMind and earlier versions of AlphaZero ............................... 3
1.2. Why chess? ............................................................................................... 4
1.3. How everything started ............................................................................ 4
2. About AlphaZero ............................................................................................ 5
2.1. Computing power ..................................................................................... 5
2.2. How it learns ............................................................................................ 5
2.2.1. Neural networks ................................................................................. 6
2.3. Win against Stockfish and Elmo .............................................................. 7
3. Generalisation ................................................................................................. 8

2
1. Intro

About six months ago, DeepMind has published a paper on AlphaZero. It


describes an algorithm that can achieve, within 24 hours, a superhuman level of
play in the games of chess and Shogi (Japanese chess) and convincigly defeat a
world-champion.

1.1. About DeepMind and earlier versions of AlphaZero


DeepMind is a British AI company that was acquired by Google. The company
has created a neural network that learns how to play games in a fashion similar to
that of humans.
In 2016, they created AlphaGo program that has won against a human
professional Go player. That was a breakthrough because it was the first time that
computer won against Go player. AlphaGo also won against the world champion
in Go, Lee Sedol.

Match 3 of AlphaGo vs Lee Sedol in March 2016.

In 2017, AlphaGo Zero arrived and it defeated AlphaGo 100 games to 0.


AlphaGo needed months to learn how to play. AlphaGo Zero needs just three days
to learn and it uses less processing power than AlphaGo. AI is so powerful that it
derived thousands of years of human knowledge of the game before inventing
better moves of its own.
Finally, in December 2017, they published AlphaZero. The AlphaZero algorithm
is a more generic version of the AlphaGo Zero algorithm.

3
1.2. Why chess?
The game of chess is the most widely-studied domain in the history of artificial
intelligence. The study of computer chess is as old as computer science itself.
Charles Babbage (also known as father of the computer), Alan Turing (father of
theoretical computer science and artificial intelligence), Claude Shannon (founder
of information theory), and John von Neumann (creator of von Neuman model that
is the basis of most modern computer designs) devised hardware, algorithms and
theory to analyse and play the game of chess.

1.3. How everything started


One of the first breakthroughs in this area was the victory of IBM Deep Blue in
1997 over Garry Kasparov (the world champion at the time). Before that,
machines were considered inferior to humans in the game of chess. After the loss,
Kasparov said: “Suddenly, Deep Blue played like a god for one moment.”

Garry Kasparov's reaction during the game

Deep Blue was a chess-playing computer developed by IBM. Deep Blue


employed custom chips to perform the alpha-beta search algorithm in parallel, an
example of GOFAI (Good Old-Fashioned Artificial Intelligence) rather than of
deep learning which would come a decade later.
It was a brute force method, and one of its developers even denied that it was
artificial intelligence at all.

4
2. About AlphaZero

When IBM’s supercomputer Deep Blue beat Gary Kasparov in 1997, it was
because it had been instructed with the best moves. But AlphaZero has learned
completely on its own.
DeepMind said the difference between AlphaZero and its rivals is that its machine-
learning approach is given no human input apart from the basic rules of chess.
One of the key advances here is that the AlphaZero wasn’t specifically designed
to play any of these games. In each case, it was given some basic rules (like how
queen moves in chess, and so on) but was programmed with no other strategies or
tactics.

2.1. Computing power


Instead of an alpha-beta search with domain-specific enhancements, AlphaZero
uses a generalpurpose Monte-Carlo tree search (MCTS) algorithm. A Monte Carlo
approach connects the two networks to a search tree.
At every point the program plays a number of games against itself, that always
start with the current position.
In a learning phase AlphaZero used 5000 TPUs (Google's tensor processing unit)
to play games against itself. 64 TPUs were used for the training of the neuronal
network. For actually playing chess, only 4 TPUs were used.
Even though the 4-hour figure may seem impressive, this is mainly due to the large
capacities of computing power available nowadays.

2.2. How it learns


It started learning chess by playing games against itself. Game one would have
involved totally random moves. At the end of this game, AlphaZero had learned
that the losing side had done stuff that wasn’t all that smart, and that the winning
side had played better. AlphaZero had taught itself its first chess lesson.
AlphaZero starts out as a blank slate. It knows absolutely nothing about any
particular game at all.
The first step was to give AlphaZero the rules of chess. This meant it can now play
random, but allowed moves. It was left to play millions of games against itself.
When it started, AlphaZero could only play random moves and all it knew was
that checkmate is the goal.

5
Four hours and 44 million games of split-personality chess later, AlphaZero had
taught itself enough to become the greatest chess player and it exceed Stockfish’s
rating.

2.2.1. Neural networks


AlphaZero’s learning happens using a neural network, which can be visualized
like this:

Example of neural network

Neural network is making a computer system more like the human brain. The
current position on the chessboard, comes in on the left. It gets processed by the
first layer of neurons, each of which then sends its output to each neuron in the
next layer until they produce the final output.
Neuron is very simple processing unit that accepts a number of inputs, multiplies
each one by a particular weight, sums the answers and then applies an activation
function that gives an output in the range of 0 to 1.
The architecture of the AlphaZero program is based on an interaction of two neural
networks, a "policy network" to define candidate moves, and a "value network" to
evaluate positions.
AlphaZero’s neural network has up to 80 layers, and hundreds of thousands of
neurons.

6
2.3. Win against Stockfish and Elmo
In order to prove the superiority of AlphaZero over previous chess engines, a 100-
game match against Stockfish was played. The selection of Stockfish as the rival
chess engine seems reasonable, being open-source and one of the strongest chess
engines nowadays.
Stockfish (open-source chess engine) won the 2016 TCEC Championship and the
2017 Chess.com Computer Chess Championship, didn't stand a chance.
AlphaZero won the closed-door, 100-game match with 28 wins, 72 draws, and
zero losses.

Game between AlphaZero and Stockfish

What do you do if you are a thing that never tires and you just mastered a 1400-
year-old game? You conquer another one. After the Stockfish match, AlphaZero
then "trained" for only two hours and then beat the best Shogi-playing computer
program "Elmo."

7
3. Generalisation

The use of a general-purpose learning that can work in many domains is one of
the main claims in AlphaZero.
DeepMind eventually wants to use the algorithm to solve health problems. They
believe that the algorithm could come up with cures for major illness in a matter
of days or weeks, which would have taken humans hundreds of years to find.
The company has already begun using AlphaZero to study protein folding and has
promised it will soon publish new findings. Misfolded proteins are responsible for
many devastating diseases, including Alzheimer’s, Parkinson’s and cystic fibrosis.
It seems unrealistic to think that many situations in real-life can be simplified to a
fixed predefined set of rules, as it is the case of chess, Go or Shogi.

8
References

[1] Silver et al. “Mastering Chess and Shogi by Self-Play with a General
Reinforcement Learning Algorithm.” arXiv preprint arXiv:1712.01815
(2017). https://arxiv.org/pdf/1712.01815.pdf

[2] https://en.wikipedia.org/wiki/Deep_Blue_versus_Garry_Kasparov

[2] https://www.theguardian.com/technology/2016/mar/15/googles-
alphago-seals-4-1-victory-over-grandmaster-lee-sedol

[4] https://www.theguardian.com/technology/2017/dec/07/alphazero-
google-deepmind-ai-beats-champion-program-teaching-itself-to-play-
four-hours

[5] https://www.chess.com/news/view/google-s-alphazero-destroys-
stockfish-in-100-game-match

Das könnte Ihnen auch gefallen