Beruflich Dokumente
Kultur Dokumente
ISC 2011
Zero-Sum
Ones gain Perfect info Multi player game
Simple and involved
Matching Pennies
others loss
Head Tail
ISC 2011
Motivation
Play as you go
Thousands players
NVIDIA Tesla
Game cloud
GPU computing
Mobile computer
Commodity
NVIDIA Tegra
ISC 2011
Problem
Game
Two player Maximize look ahead Rapid node expansion
1025 for 4x4x4 Tic-Tac-Toe 10120 for Chess
Search
Efficient parallel
Space split Statistical simulation Simultaneous matches
ISC 2011
Parallelism
Principal Variation Split
Strongly ordered tree Synchronization bound Load imbalance
ISC 2011
Cut Nodes
PV Bounds Equal weight
4 A
d
Root
Static split
B
g
ISC 2011
Heuristic Search
ISC 2011
Statistical Simulation
Monte Carlo method
Annealing process
ISC 2011
Challenges
Deep recursion, limited stack Divergent, irregular threads Dynamic parallelism Low arithmetic intensity
ISC 2011
Implementation
Kernel for each
Alpha-Beta, Monte Carlo
Games
3D Tic-Tac-Toe
Connect-4
Reversi
Go
ISC 2011
State Abstraction
Cells
Player
Successors
Move
Manipulate
Update
Query
Winner
Undo
Full
ISC 2011
Stack
Recursion depth >1000 Greedy allocation
Hybrid design
Local Memory
Runtime/Compiler
Local variables Function parameters
User
Successors
ISC 2011
Producer-Consumer
game init producer
Parallel Parallel node games search Rank Scoring Reduction-Max trial moves
game over
ISC 2011
Shared
Kernel global scope
Check local
ISC 2011
Limitations
Stack allocation
Bounds parallelism
ISC 2011
Methodology
CUDA Toolkit 3.1, Windows Single processor
GPU GTX480 SMs 15 Warps/SM 2 Clocks(MHz) 723/1446/1796 L1/Shared (KB) 48/16 L2(KB) 640
CPU I7-940
Cores 8
Clocks(MHz) 2942/(3*1066)
ISC 2011
Space Split
2.5
4x4x4 Tic-Tac-Toe
2
Seconds/Move 1.5 1 0.5 0
lower is good
Nave Split
Shared
3906
3660
2756
2550
ISC 2011
Monte Carlo
80 0.5
Seconds/Move Average Seconds/Move Average
70 0.4 60
50 0.3 40 0.2 30 20 0.1 10 0 0
128 128 1024 1024 4096 4096 8192 8192 16384 16384
9 9
19 19
ISC 2011
Simultaneous Matches
0.3 Average Seconds /Move
0.25
0.2 0.15 0.1 0.05
0
lower is good
16
4096
16384
ISC 2011
Future Work
Data packing optimization Per-SM transposition table
ISC 2011
GPU Performance
Game Go
ISC 2011
Summary
Efficient GPU based
Heuristic search
Statistical Simulation
Economical solution
Mobile-Cloud
ISC 2011
Thank You!
Questions?
ISC 2011
Info
Base
http://developer.nvidia.com
Toolkit
CUDA Zone
Debugger
Parallel Nsight