05games PDF

COMP 424 - Artificial Intelligence Lecture 5: Game Playing
Instructors:
Joelle Pineau (jpineau@cs.mcgill.ca) Sylvie Ong (song@cs.mcgill.ca)
Class web page: www.cs.mcgill.ca/~jpineau/comp424
Search approaches for different domains

Standard search
deterministic actions, fully observable state of the world
BFS, DFS. best-first search, A* search.
Searching under uncertainty

non-deterministic actions, fully observable state of the world
AND-OR search.
deterministic actions, un-/partially-observable state of the world

search in belief space
All the above are 1-agent problems. Today: Adversarial search games with 2 players
COMP-424: Artificial intelligence 2 Joelle Pineau
Overview
Why games? Minimax search Evaluation functions Alpha-beta pruning State-of-the-art game playing programs
COMP-424: Artificial intelligence
Joelle Pineau
Game playing
One of the oldest, most well-studied domains in AI!
People like them! And are good at playing them. Many games are hard:
State spaces are very large and complicated. E.g. Chess has branching factor of 35, games go to 50 moves per player, so 35100 nodes in search tree! Sometimes there is stochasticity and imperfect information. Real-time constraints (e.g. fixed amount of time between moves).
Why?
Clear, clean description of the environment. Easy performance indicator.
Joelle Pineau
Human or computer - who is better?

Checkers:
1994: Chinook (U.of A.) beat world champion Marion Tinsley, ending 40-yr reign.
Othello:
1997: Logistello (NEC research) beat the human world champion. Today: world champions refuse to play AI computer program (because its too good).
Chess:
1997: Deep Blue (IBM) beat world champion Gary Kasparov
Backgammon:
TD-Gammon (IBM) is world champion amongst humans and computers
Go:
Human champions refuse to play top AI player (because its too weak)
Bridge:
Still out of reach for AI players because of coordination issue.
Joelle Pineau
Types of games
Perfect vs Imperfect information
Perfect: See the exact state of the game E.g. chess, backgammon, checkers, go, othello Imperfect: Information is hidden
E.g. scrabble, bridge, most card games
Deterministic vs Stochastic
Deterministic: Change in state is fully determined by player move. E.g. chess Stochastic: Change in state is partially determined by chance. E.g. backgammon
Joelle Pineau
Game playing as search

Consider 2-player zero-sum game with deterministic actions, perfect information. Can we formulate them as a search problem?
State: Operators: Goal: Cost:
We want to find a strategy (i.e. way of picking moves) that wins the game.
Adversarial search definition

S0: The initial state of the game. Player(s): Which player has the next move in a state. Actions(s): Set of legal moves for that state. Results(s,a): Transition model, defining the result of a move. Terminal-Test(s): True if the game is over, False otherwise. Utility(s,p): Final numerical value for a game that ends in terminal state s for player p.
Joelle Pineau
Game search challenge

Not quite the same as simple searching. There is an opponent! The opponent is malicious!
Opponent is trying to make things good for itself, and bad for us. We have to simulate the opponents decisions.
Key idea:
Define a max player (who wants to maximize its utility) And a min player (who wants to minimize it.)
Joelle Pineau
Example: Game Tree for Tic-Tac-Toe
10
Joelle Pineau
Minimax search
Expand complete search tree, until terminal states have been reached and their utilities computed. Go back up from leaves towards the current state of the game.
At each min node: backup the worst value among the children. At each max node: backup the best value among the children.
In detail:
11
Joelle Pineau
Minimax evaluation at each node

Minimax (s) =
Utility(s) if Terminal-Test(s) = True maxa in Actions(s) Minimax(Results(s,a)) if Player(s) = Max-player mina in Actions(s) Minimax(Results(s,a)) if Player(s) = Min-player
12
Joelle Pineau
Properties of Minimax search

Complete?
If the game tree is finite.
Optimal?
Against an optimal opponent (otherwise we dont know.)
Time complexity?
O(bm)
Space complexity?
O(bm) if we re-use the search tree between moves O(bm) if we use DFS
Why not use Minimax to solve chess? In chess: b35, m100, so an exact solution is impossible!
13
Joelle Pineau
Coping with resource limitations

Suppose we have 100 seconds to make a move, and we can search 104 nodes per second.
Can only search 106 nodes. (Or even fewer, if we spend time deciding which nodes to search.)
Possible approach:
Use a cutoff test (e.g. based on depth limit) Use an evaluation function for the nodes where we cutoff the search.
14
Joelle Pineau
Cutting the search effort

Use evaluation function to evaluate non-terminal nodes.
Helps us make a decision without searching until the end of the game.
Minimax cutoff algorithm: Same as standard Minimax, except stop at some maximum depth m and use the evaluation function on those nodes.
15
Joelle Pineau
Minimax with cutoff

Minimax (s) =
Utility(s) if Terminal-Test(s) = True maxa in Actions(s) Minimax(Results(s,a)) if Player(s) = Max-player mina in Actions(s) Minimax(Results(s,a)) if Player(s) = Min-player
H-Minimax (s,d) =
EVAL(s) if Cutoff-Test(s,d) = True if Player(s) = Max-player mina in Actions(s) H-Minimax(Results(s,a),d+1) if Player(s) = Min-player
maxa in Actions(s) H-Minimax(Results(s,a),d+1)
Evaluation functions
An evaluation function v(s) represents the goodness of a board state (e.g. chance of winning from that position).
If the features of the board can be evaluated independently, use a weighted linear function:
v(s) = w1f1(s) + w2f2(s) + + wnfn(s) (where s is board state)
This function can be given by the designer or learned from experience.
17
Joelle Pineau
Example: Chess
Black to move White slightly better
White to move Black winning
Linear evaluation function: v(s) = w1f1(s) + w2f2(s)

w1 = 9 w2 = 3 f1(s) = (# white queens) - (# black queens) f2(s) = (# white pawns) - (# black pawns)
18
Joelle Pineau
How precise should the evaluation fn be?

Evaluation function is only approximate, and is usually better if we are close to the end of the game. Move chosen is the same if we apply a monotonic transformation to the evaluation function.
Only the order of the numbers matter: payoffs in deterministic games act as an ordinal utility function.
19
Joelle Pineau
Minimax cutoff in Chess

How many moves ahead can we search in Chess?
>> 106 nodes with b=35 allows us to search 4 moves ahead!
Is this useful?
4-moves ahead novice player 8-moves ahead human master, typical PC 12-moves ahead Deep Blue, Kasparov
Key idea:
Search few lines of play, but search them deeply. Need pruning!
20
Joelle Pineau
10
- Pruning example
3
12
21
Joelle Pineau
- Pruning example
3
12
22
Joelle Pineau
11
- Pruning example
3
X 3 12 8 2
23
Joelle Pineau
- Pruning example
3
14
X 3 12 8 2
X 14
24
Joelle Pineau
12
- Pruning example
3
14 5
X 3 12 8 2
X 14 5
25
Joelle Pineau
- Pruning example
3 3
14 5 2
X 3 12 8 2
X 14 5 2
26
Joelle Pineau
13
- Pruning
Simple idea: if a path looks worse than what we already have, discard it. Standard technique for deterministic, perfect information games. Algorithm is like Minimax, but keeps track of best leaf value for our player () and best one for the opponent () If the best move at a node cannot change (regardless of what we would find by searching) then no need to search further! Provably correct (i.e. wont cut off a good branch) given the evaluation function.
A closer look at our example
28
Joelle Pineau
14
Properties of - pruning
Pruning does not affect the final result! But can greatly increase efficiency!! Good move ordering is key to the effectiveness of pruning.
With perfect ordering, time complexity is O(bm/2)
Means double the search depth, for same resources. In chess: this is difference between novice and expert player.
With bad move ordering, time complexity is O(bm)

Means nothing was pruned.
Evaluation function can be used to order the nodes.
29
Joelle Pineau
- Pruning example
3 3
14 5 2
X 3 12 8 2
X 14 5 2
30
Joelle Pineau
15
Properties of - pruning
Pruning does not affect the final result! But can greatly increase efficiency!! Good move ordering is key to the effectiveness of pruning.
With perfect ordering, time complexity is O(bm/2)
Means double the search depth, for same resources. In chess: this is difference between novice and expert player.
With bad move ordering, time complexity is O(bm)

Means nothing was pruned.
Evaluation function can be used to order the nodes.
The - pruning demonstrates the value of reasoning about which computations are important!
Forward pruning
Another simple idea:
Only explore n best moves for current state (according to the evaluation function).
Unlike - pruning, this can lead to sub-optimal solution. Can be very efficient with a good evaluation function.
32
Joelle Pineau
16
Deep Blue (IBM)

Specialized chess processor, special-purpose memory architecture. Very sophisticated evaluation function (expert features, tuned weights). Database of standard openings/closings. Uses a version of - pruning (with undisclosed improvements)
Can search up to 40-deep in some branches.
Can search over 30 billion positions (depth ~ 14) per move.
33
Joelle Pineau
Chinook (Schaeffer, U. of Alberta)

Best checkers player (since 1990s). Plain - search, performed on standard PCs. Evaluation function based on expert features of the board. Opening database, HUGE endgame database (39 trillion positions!) Only a few moves in the middle of the game are actually searched. Since 2007, Chinook can play perfectly.
34
Joelle Pineau
17
Logistello (Buro, U. of Alberta)

Best Othello player since 1997. Thinks during the opponents time. Smaller search space than Chess (~5-15 legal moves)
- search with a linear evaluation function Evaluation function had to be developed from scratch.
hand-selected features and weights tuned by learning
Database of openings, continuously updated.
35
Joelle Pineau
Why is Go so hard?
Computer Go players are at Master level for 9x9 board, but at advanced amateur level for full 19x19 board. Branching factor > 300, so search is very difficult. Difficult to write a good evaluation function until the endgame. Top programs, such a MoGo, avoid alpha-beta search and instead use Monte-Carlo rollouts.
Sample random moves in the first few iterations. Guide the sampling process to prefer moves that have led to wins in previous samples (but dont prune! i.e. still consider all moves) Use knowledge-based rules to select particular moves when certain patterns are detected.
36
Joelle Pineau
18
Summary
Understand how to define game playing as a search problem. Be able to implement the Minimax algorithm with alpha-beta pruning. Understand the use of an evaluation function. Read all of Ch.5 in detail.
37
Joelle Pineau
19

05games PDF

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

05games PDF

Hochgeladen von

Copyright:

Verfügbare Formate

COMP 424 - Artificial Intelligence Lecture 5: Game Playing

Joelle Pineau (jpineau@cs.mcgill.ca) Sylvie Ong (song@cs.mcgill.ca)

Class web page: www.cs.mcgill.ca/~jpineau/comp424

Search approaches for different domains

Searching under uncertainty

deterministic actions, un-/partially-observable state of the world

COMP-424: Artificial intelligence

Clear, clean description of the environment. Easy performance indicator.

COMP-424: Artificial intelligence

Human or computer - who is better?

COMP-424: Artificial intelligence

COMP-424: Artificial intelligence

Game playing as search

Adversarial search definition

COMP-424: Artificial intelligence

Game search challenge

COMP-424: Artificial intelligence

Example: Game Tree for Tic-Tac-Toe

COMP-424: Artificial intelligence

COMP-424: Artificial intelligence

Minimax evaluation at each node

COMP-424: Artificial intelligence

Properties of Minimax search

COMP-424: Artificial intelligence

Coping with resource limitations

COMP-424: Artificial intelligence

Cutting the search effort

COMP-424: Artificial intelligence

Minimax with cutoff

maxa in Actions(s) H-Minimax(Results(s,a),d+1)

This function can be given by the designer or learned from experience.

COMP-424: Artificial intelligence

Black to move White slightly better

White to move Black winning

Linear evaluation function: v(s) = w1f1(s) + w2f2(s)

COMP-424: Artificial intelligence

How precise should the evaluation fn be?

COMP-424: Artificial intelligence

Minimax cutoff in Chess

COMP-424: Artificial intelligence

COMP-424: Artificial intelligence

COMP-424: Artificial intelligence

COMP-424: Artificial intelligence

COMP-424: Artificial intelligence

COMP-424: Artificial intelligence

COMP-424: Artificial intelligence

A closer look at our example

COMP-424: Artificial intelligence

With bad move ordering, time complexity is O(bm)

Evaluation function can be used to order the nodes.

COMP-424: Artificial intelligence

COMP-424: Artificial intelligence

With bad move ordering, time complexity is O(bm)

Evaluation function can be used to order the nodes.

COMP-424: Artificial intelligence

Deep Blue (IBM)

Can search over 30 billion positions (depth ~ 14) per move.

COMP-424: Artificial intelligence

Chinook (Schaeffer, U. of Alberta)

COMP-424: Artificial intelligence

Logistello (Buro, U. of Alberta)

Database of openings, continuously updated.