GamePlaying MODULE 4

Game playing
Outline
• Optimal decisions
• α-β pruning
• Imperfect, real-time decisions
Games vs. search problems
• "Unpredictable" opponent  specifying a
move for every possible opponent reply
• Time limits  unlikely to find goal, must

approximate
Game tree (2-player,
deterministic, turns)
Optimal strategy
• Perfect play for deterministic games
• Minimax Value for a node n
• This definition is used recursively
• Idea: minimax value is the best achievable payoff

against best play
Minimax example
• Perfect play for deterministic games
• Minimax Decision at root: choose the action a that
lead to a maximal minimax value
• MAX is guaranteed for a utility which is at least the
minimax value – if he plays rationally.
Minimax algorithm
Properties of minimax
• Complete? Yes (if tree is finite)
• Optimal? Yes (against an optimal opponent)
• Time complexity? O(bm)
• Space complexity? O(bm) (depth-first exploration)
• For chess, b ≈ 35, m ≈100 for "reasonable" games

 exact solution completely infeasible
Multiplayer games
• Each node must hold a vector of values
• For example, for three players A, B, C (vA, vB, vC)
• The backed up vector at node n will always be the one
that maximizes the payoff of the player choosing at n
α-β pruning example
Properties of α-β
• Pruning does not affect final result
• Good move ordering improves effectiveness of pruning
• With "perfect ordering," time complexity = O(bm/2)

 doubles depth of search
• A simple example of the value of reasoning about which

computations are relevant (a form of metareasoning)
The α-β algorithm
The α-β algorithm
Why is it called α-β?
•  is the value of the best
(i.e., highest-value) choice
found so far for MAX at
any choice point along the
path to the root.
• If v is worse than , MAX
will avoid it
 prune that branch
•  is the value of the best
(i.e., lowest-value) choice
found so far for MIN at any
choice point along the path
for to the root.
Another example
5 7 10 3 1 2 9 9 8 2 9 3
How much do we gain?
 Assume a game tree of uniform branching factor b
 Minimax examines O(bh) nodes, so does alpha-beta in
the worst-case
 The gain for alpha-beta is maximum when:
• The MIN children of a MAX node are ordered in decreasing
backed up values
• The MAX children of a MIN node are ordered in increasing
backed up values
 Then alpha-beta examines O(bh/2) nodes [Knuth and Moore, 1975]
 But this requires an oracle (if we knew how to order nodes
perfectly, we would not need to search the game tree)
 If nodes are ordered at random, then the average
number of nodes examined by alpha-beta is ~O(b3h/4)
Heuristic Ordering of Nodes
 Order the nodes below the root according to

the values backed-up at the previous iteration
 Order MAX (resp. MIN) nodes in decreasing

(increasing) values of the evaluation function
computed at these nodes
Games of imperfect information
• Minimax and alpha-beta pruning require
too much leaf-node evaluations.
• May be impractical within a reasonable
amount of time.
• SHANNON (1950):
– Cut off search earlier (replace TERMINAL-
TEST by CUTOFF-TEST)
– Apply heuristic evaluation function EVAL
(replacing utility function of alpha-beta)
Cutting off search
• Change:
– if TERMINAL-TEST(state) then return
UTILITY(state)
into
– if CUTOFF-TEST(state,depth) then return EVAL(state)
• Introduces a fixed-depth limit depth

– Is selected so that the amount of time will not exceed what
the rules of the game allow.
• When cutoff occurs, the evaluation is
performed.
Heuristic EVAL
• Idea: produce an estimate of the expected
utility of the game from a given position.
• Performance depends on quality of EVAL.
• Requirements:
– EVAL should order terminal-nodes in the same way
as UTILITY.
– Computation may not take too long.
– For non-terminal states the EVAL should be
strongly correlated with the actual chance of
winning.
• Only useful for quiescent (no wild swings in
value in near future) states
Heuristic EVAL example
Eval(s) = w1 f1(s) + w2 f2(s) + … + wnfn(s)

Heuristic EVAL example
Addition assumes
independence
Eval(s) = w1 f1(s) + w2 f2(s) + … + wnfn(s)
Heuristic difficulties
Heuristic counts pieces won
Horizon effect
Fixed depth search
Makes black think
it can avoid the
queening move of
White pawn
Games that include chance
• Possible moves (5-10,5-11), (5-11,19-24),(5-

10,10-16) and (5-11,11-16)
chance nodes
• Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16)

and (5-11,11-16)
• [1,1], [6,6] chance 1/36, all other chance 1/18
• [1,1], [6,6] chance 1/36, all other chance 1/18

• Can not calculate definite minimax value, only
expected value
Expected minimax value
EXPECTED-MINIMAX-VALUE(n)=
UTILITY(n) if n is a terminal
maxs  successors(n) MINIMAX-VALUE(s) if n is a max node
mins  successors(n) MINIMAX-VALUE(s) if n is a max node
s  successors(n) P(s) . EXPECTEDMINIMAX(s) if n is a chance
node
These equations can be backed-up

recursively all the way to the root of the
game tree.
Position evaluation with chance
nodes
• Left, A1 wins
• Right A2 wins
• Outcome of evaluation function may not change when
values are scaled differently.
• Behavior is preserved only by a positive linear
transformation of EVAL.
State-of-the-Art
Checkers: Tinsley vs. Chinook
Name: Marion Tinsley

Profession:
Teach mathematics
Hobby: Checkers
Record: Over 42 years
loses only 3 games
of checkers
World champion for over 40
years
Mr. Tinsley suffered his 4th and 5th losses against Chinook
Chinook
First computer to become official world champion of Checkers!

Has all endgame table for 10 pieces or less: over 39 trillion
entries.
Chess: Kasparov vs. Deep Blue
Kasparov Deep Blue
5’10” Height 6’ 5”
176 lbs Weight 2,400 lbs
34 years Age 4 years
50 billion neurons Computers 32 RISC processors
+ 256 VLSI chess engines
2 pos/sec Speed 200,000,000 pos/sec
Extensive Knowledge Primitive
Electrical/chemical Power Source Electrical
Enormous Ego None
1997: Deep Blue wins by 3 wins, 1 loss, and 2 draws

Chess: Kasparov vs. Deep Junior
Deep Junior
8 CPU, 8 GB RAM, Win 2000

2,000,000 pos/sec
Available at $100
August 2, 2003: Match ends in a 3/3 tie!

Othello: Murakami vs. Logistello
Takeshi Murakami
World Othello Champion
1997: The Logistello software crushed Murakami

by 6 games to 0
Backgammon
• 1995 TD-Gammon by Gerald Thesauro

won world championship on 1995
• BGBlitz won 2008 computer backgammon
olympiad
Go: Goemate vs. ??
Name: Chen Zhixing
Profession: Retired
Computer skills:
self-taught programmer
Author of Goemate (arguably the
best Go program available today)
Gave Goemate a 9 stone

handicap and still easily
beat the program,
thereby winning $15,000
Go: Goemate vs. ??
Name: Chen Zhixing
Profession: Retired
Computer skills:
self-taught programmer
Go has too high a branching factor
Author of Goemate (arguably the
for existing search techniques
strongest Go programs)
Current and future software must
Gave
rely on huge Goemate aand
databases 9 stone
pattern-
handicap and still easily
recognitionbeat
techniques
the program,
thereby winning $15,000
Jonathan Schaeffer
– March 2016
• Developed by Google DeepMind in London to
play the board game Go.
• Plays full 19x19 games
• October 2015: the distributed version of
AlphaGo defeated the European Go champion
Fan Hui - five to zero
• March 2016 AlphaGo played South Korean
professional Go player Lee Sedol, ranked 9-dan,
one of the best Go players – four to one.
• A significant breakthrough in AI research!!!
Secrets
 Many game programs are based on alpha-beta +
iterative deepening + extended/singular search +
transposition tables + huge databases + ...
 For instance, Chinook searched all checkers
configurations with 8 pieces or less and created an
endgame database of 444 billion board
configurations
 The methods are general, but their implementation

is dramatically improved by many specifically
tuned-up enhancements (e.g., the evaluation
functions)
Perspective on Games: Con and Pro
Chess is the Drosophila of Saying Deep Blue doesn’t

artificial intelligence. However, really think about chess
computer chess has developed is like saying an airplane
much as genetics might have if doesn't really fly because
the geneticists had concentrated it doesn't flap its wings.
their efforts starting in 1910 on
Drew McDermott
breeding racing Drosophila. We
would have some science, but
mainly we would have very fast
fruit flies.
John McCarthy
Other Types of Games
 Multi-player games, with alliances or not
 Games with randomness in successor
function (e.g., rolling a dice)
 Expectminimax algorithm
 Games with partially observable states
(e.g., card games)
 Search of belief state spaces
See R&N p. 175-180

GamePlaying MODULE 4

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

GamePlaying MODULE 4

Hochgeladen von

Copyright:

Verfügbare Formate

Game playing

• Time limits  unlikely to find goal, must

• This definition is used recursively

• Idea: minimax value is the best achievable payoff

• For chess, b ≈ 35, m ≈100 for "reasonable" games

• Good move ordering improves effectiveness of pruning

• With "perfect ordering," time complexity = O(bm/2)

• A simple example of the value of reasoning about which

 Order the nodes below the root according to

 Order MAX (resp. MIN) nodes in decreasing

• Introduces a fixed-depth limit depth

Eval(s) = w1 f1(s) + w2 f2(s) + … + wnfn(s)

• Possible moves (5-10,5-11), (5-11,19-24),(5-

• Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16)

• [1,1], [6,6] chance 1/36, all other chance 1/18

These equations can be backed-up

Name: Marion Tinsley

First computer to become official world champion of Checkers!

Kasparov Deep Blue

1997: Deep Blue wins by 3 wins, 1 loss, and 2 draws

8 CPU, 8 GB RAM, Win 2000

August 2, 2003: Match ends in a 3/3 tie!

1997: The Logistello software crushed Murakami

• 1995 TD-Gammon by Gerald Thesauro

Gave Goemate a 9 stone

 The methods are general, but their implementation

Chess is the Drosophila of Saying Deep Blue doesn’t

See R&N p. 175-180

Das könnte Ihnen auch gefallen