Sie sind auf Seite 1von 24

ISC 2011

Producer-Consumer Model for Massively Parallel Zero-Sum Games on the GPU


Avi Bleiweiss NVIDIA Corporation

ISC 2011

Zero-Sum
Ones gain Perfect info Multi player game
Simple and involved
Matching Pennies

others loss

Head 1,-1 -1,1

Tail -1,1 1,-1

Head Tail

ISC 2011

Motivation
Play as you go
Thousands players
NVIDIA Tesla

Game cloud
GPU computing

Mobile computer
Commodity

NVIDIA Tegra

ISC 2011

Problem

Game
Two player Maximize look ahead Rapid node expansion
1025 for 4x4x4 Tic-Tac-Toe 10120 for Chess

Search
Efficient parallel
Space split Statistical simulation Simultaneous matches

Many thousands threads

ISC 2011

Parallelism
Principal Variation Split
Strongly ordered tree Synchronization bound Load imbalance

Young Brothers Wait Concept


Parallel at any node Processor owns node Up to 1024 processors

Dynamic Tree Splitting


Processors share node Global job list Reasonable speedup

No massive parallel solution in shared memory settings!

ISC 2011

Cut Nodes
PV Bounds Equal weight

4 A
d

Root

Static split

B
g

ISC 2011

Heuristic Search

ISC 2011

Statistical Simulation
Monte Carlo method
Annealing process

-simulation move error


()~

Many thousands games


for 19x19 Go game

ISC 2011

Challenges
Deep recursion, limited stack Divergent, irregular threads Dynamic parallelism Low arithmetic intensity

ISC 2011

Implementation
Kernel for each
Alpha-Beta, Monte Carlo

Board C++ class


Rules specific

Games

Heuristic Search Monte Carlo Simulation

3D Tic-Tac-Toe

Connect-4

Reversi

Go

ISC 2011

State Abstraction

Cells
Player

Successors
Move

Manipulate
Update

Query
Winner

Undo

Full

ISC 2011

Stack
Recursion depth >1000 Greedy allocation

Hybrid design
Local Memory

Runtime/Compiler
Local variables Function parameters

User
Successors

ISC 2011

Producer-Consumer
game init producer

Find Random highest trial cut moves nodes


foreach move

CPU GPU consumer

Parallel Parallel node games search Rank Scoring Reduction-Max trial moves
game over

Thousands of working threads


Game Tree

ISC 2011

Shared
Kernel global scope

Check local

Global atomic update

ISC 2011

Limitations
Stack allocation
Bounds parallelism

Static split constraints


Depth 1 2 Tic-Tac-Toe Threads 3x3x3 650 4x4x4 3906 5x5x5 15252

15600 238266 1860744

ISC 2011

Methodology
CUDA Toolkit 3.1, Windows Single processor
GPU GTX480 SMs 15 Warps/SM 2 Clocks(MHz) 723/1446/1796 L1/Shared (KB) 48/16 L2(KB) 640

CPU I7-940

Cores 8

Clocks(MHz) 2942/(3*1066)

L1/L2 (KB) 32/8192

ISC 2011

Space Split
2.5

4x4x4 Tic-Tac-Toe

2
Seconds/Move 1.5 1 0.5 0
lower is good

Nave Split

Shared

3906

3660

3422 3192 2970 Threads/Move

2756

2550

ISC 2011

Monte Carlo
80 0.5
Seconds/Move Average Seconds/Move Average

70 0.4 60
50 0.3 40 0.2 30 20 0.1 10 0 0

128 128 1024 1024 4096 4096 8192 8192 16384 16384

Go Go Running Time (CPU) Running Time

lower loweris is good good

9 9

13 13 Board Dimension Dimension Board

19 19

ISC 2011

Simultaneous Matches
0.3 Average Seconds /Move

Multi Game Running Time


3D Tic-Tac-Toe Connect-4 Reversi

0.25
0.2 0.15 0.1 0.05

0
lower is good

16

128 1024 Matches

4096

16384

ISC 2011

Future Work
Data packing optimization Per-SM transposition table

Predictable node ranks


Dynamic space split

ISC 2011

GPU Performance

Metric Monte Carlo vs. CPU

Game Go

Dimension 4x4x4 19x19

Speedup 13.37X mean 121.64X @16K

Shared vs. Nave Split 3D Tic-Tac-Toe

ISC 2011

Summary
Efficient GPU based
Heuristic search

Statistical Simulation

Economical solution
Mobile-Cloud

ISC 2011

Thank You!

Questions?

ISC 2011

Info
Base
http://developer.nvidia.com

GPU AI for Board Games


Technology Preview

Toolkit
CUDA Zone

Debugger
Parallel Nsight

Das könnte Ihnen auch gefallen