News
Local
Video
Images Blogs
Books
Web
• Benefits of structure:
– easy to examine and evolve (add user_language to request)
– language independent
– teams can operate independently
Long distance links: wild dogs, sharks, dead horses, drunken hunters, etc.
Long distance links: wild dogs, sharks, dead horses, drunken hunters, etc.
• Slow machines
– replicate the computation (MapReduce)
• Bad latency
– utilize replicas to improve latency
– improved worldwide placement of data and services
Linux
file system
chunkserver
Linux
various other
system services
file system scheduling
chunkserver system
Linux
Bigtable
tablet server
various other
system services
file system scheduling
chunkserver system
Linux
cpu intensive
job
Bigtable
tablet server
various other
system services
file system scheduling
chunkserver system
Linux
cpu intensive
job
random Bigtable
MapReduce #1 tablet server
various other
system services
file system scheduling
chunkserver system
Linux
random
app #2
cpu intensive random
job app
random Bigtable
MapReduce #1 tablet server
various other
system services
file system scheduling
chunkserver system
Linux
• Tolerating variability:
– use these same extra resources
– make a predictable whole out of unpredictable parts
req 3 req 5
req 6
Server 1 Server 2
Client
Similar to Michael Mitzenmacher’s work on “The Power of Two
Choices”, except send to both, rather than just picking “best” one
req 3 req 5
req 6
Server 1 Server 2
req 9
Client
Similar to Michael Mitzenmacher’s work on “The Power of Two
Choices”, except send to both, rather than just picking “best” one
req 3 req 5
req 6
req 9
also: server 2
Server 1 Server 2
req 9
Client
Similar to Michael Mitzenmacher’s work on “The Power of Two
Choices”, except send to both, rather than just picking “best” one
req 3 req 5
req 6 req 9
also: server 1
req 9
also: server 2
Server 1 Server 2
Client
Similar to Michael Mitzenmacher’s work on “The Power of Two
Choices”, except send to both, rather than just picking “best” one
req 3
req 6 req 9
also: server 1
req 9
also: server 2
Server 1 Server 2
Client
Similar to Michael Mitzenmacher’s work on “The Power of Two
Choices”, except send to both, rather than just picking “best” one
req 3
req 6 req 9
also: server 1
req 9
also: server 2
Server 1 Server 2
“Server 2: Starting req 9”
Client
Similar to Michael Mitzenmacher’s work on “The Power of Two
Choices”, except send to both, rather than just picking “best” one
req 3
req 6 req 9
also: server 1
req 9
also: server 2
Server 1 Server 2
“Server 2: Starting req 9”
Client
Similar to Michael Mitzenmacher’s work on “The Power of Two
Choices”, except send to both, rather than just picking “best” one
req 3
req 6 req 9
also: server 1
Server 1 Server 2
“Server 2: Starting req 9”
Client
Similar to Michael Mitzenmacher’s work on “The Power of Two
Choices”, except send to both, rather than just picking “best” one
req 3
req 6 req 9
also: server 1
Server 1 Server 2
reply
Client
Similar to Michael Mitzenmacher’s work on “The Power of Two
Choices”, except send to both, rather than just picking “best” one
req 3
req 6 req 9
also: server 1
Server 1 Server 2
Client
reply
Similar to Michael Mitzenmacher’s work on “The Power of Two
Choices”, except send to both, rather than just picking “best” one
req 3 req 5
Server 1 Server 2
Client
req 3 req 5
Server 1 Server 2
req 9
Client
req 3 req 5
req 9
also: server 2
Server 1 Server 2
req 9
Client
req 3 req 5
req 9 req 9
also: server 2 also: server 1
Server 1 Server 2
Client
req 9 req 9
also: server 2 also: server 1
Server 1 Server 2
Client
req 9 req 9
also: server 2 also: server 1
Server 1 Server 2
“Server 2: Starting req 9”
“Server 1: Starting req 9”
Client
req 9 req 9
also: server 2 also: server 1
Server 1 Server 2
“Server 2: Starting req 9”
“Server 1: Starting req 9”
Client
req 9 req 9
also: server 2 also: server 1
Server 1 Server 2
reply
Client
req 9 req 9
also: server 2 also: server 1
Server 1 Server 2
Client
reply
req 9 req 9
also: server 2 also: server 1
Server 1 Server 2
Client
reply
Input Image
(or video)
Input Image
(or video)
Input Image
(or video)
Input Image
(or video)
Input Image
(or video)
}
Some scalar, nonlinear function
of local image patch
Input Image
(or video)
Input Image
(or video)
Input Image
(or video)
Multiple “maps”
Input Image
(or video)
Layer 1
Input Image
(or video)
Reconstruction
layer
Layer 1
Input Image
(or video)
Input Image
(or video)
Layer 1
Input Image
(or video)
Layer 2
Layer 1
Input Image
(or video)
Layer 1
Input Image
(or video)
Layer 2
Layer 1
Input Image
(or video)
Traditional
ML tools
Layer 2
Layer 1
Input Image
(or video)
Partition assignment
in vertical silos.
Partition 1 Partition 2 Partition 3 Layer 2
Layer 1
Partition 1 Partition 2 Partition 3
Layer 0
Partition assignment
in vertical silos.
Partition 1 Partition 2 Partition 3 Layer 2
Minimal network traffic:
The most densely connected
areas are on the same partition
Layer 1
Partition 1 Partition 2 Partition 3
Layer 0
Partition assignment
in vertical silos.
Partition 1 Partition 2 Partition 3 Layer 2
Minimal network traffic:
The most densely connected
areas are on the same partition
Layer 1
Partition 1 Partition 2 Partition 3
Layer 0
Model
Training Data
Model
Making a single model bigger and faster
is the right first step.
Training Data
How can we add another dimension of
parallelism, and have multiple model
instances train on data in parallel?
Parameter Server
Model
Data
Parameter Server
Model
Data
Parameter Server
∆p
Model
Data
Parameter Server p’ = p + ∆p
Model
Data
Parameter Server
p’
Model
Data
Parameter Server
∆p’
Model
Data
Model
Data
Parameter Server p’ = p + ∆p
∆p p’
Model
Workers
Data
Shards
• For example:
• Use lower precision arithmetic
• Send 1 or 2 bits instead of 32 bits across network
• Drop results from slow partitions
8000-label Softmax
8000-label Softmax
8000-label Softmax
Encode Decode
Details in our ICML paper [Le et al. 2012]
Image
Friday, September 14, 2012
Purely Unsupervised Feature Learning in Images
Pool
Top level neurons seem to discover
high-level concepts. For example, one
Encode Decode neuron is a decent face detector:
Pool Non-faces
Faces
Encode Decode
Frequency
Pool
Encode Decode
Feature value
Image
Friday, September 14, 2012
Purely Unsupervised Feature Learning in Images
Most face-selective neuron
Top 48 stimuli from the test set
ImageNet:
• 16 million images
• ~21,000 categories
• Recurring academic competitions
Encode Decode
Pool
Encode Decode
Image
Friday, September 14, 2012
Semi-supervised Feature Learning in Images
Example top stimuli after fine tuning on ImageNet:
Neuron 1
Neuron 2
Neuron 3
Neuron 4
Neuron 5
Neuron 6
Neuron 7
Neuron 8
Neuron 9
Neuron 5
Neuron 10
Neuron 11
Neuron 12
Neuron 13
Neuron 5
dolphin
porpoise
dolphin
porpoise
SeaWorld
dolphin
porpoise
Obama
SeaWorld
dolphin
porpoise
Paris
Obama
SeaWorld
dolphin
porpoise
Hidden Layers?
Hidden Layers?
E E E E
the cat sat on the
Friday, September 14, 2012
Neural Language Models
• 7 Billion word Google News training set
• 1 Million word vocabulary
• 8 word history, 50 dimensional embedding
• Three hidden layers each w/200 nodes Perplexity Scores
• 50-100 asynchronous model workers
Traditional 5-gram XXX
NLM +15%
5-gram + NLM -33%
E E E E
the cat sat on the
Friday, September 14, 2012
Deep Learning Applications
Further reading:
• Ghemawat, Gobioff, & Leung. Google File System, SOSP 2003.
• Barroso, Dean, & Hölzle. Web Search for a Planet:The Google Cluster Architecture, IEEE Micro, 2003.
• Dean & Ghemawat. MapReduce: Simplified Data Processing on Large Clusters, OSDI 2004.
• Chang, Dean, Ghemawat, Hsieh, Wallach, Burrows, Chandra, Fikes, & Gruber. Bigtable: A Distributed Storage System for
Structured Data, OSDI 2006.
• Brants, Popat, Xu, Och, & Dean. Large Language Models in Machine Translation, EMNLP 2007.
• Le, Ranzato, Monga, Devin, Chen, Corrado, Dean, & Ng. Building High-Level Features Using Large Scale Unsupervised
Learning, ICML 2012.
• Dean et al. , Large Scale Distributed Deep Networks, to appear NIPS 2012.
• Corbett, Dean, ... Ghemawat, et al. Spanner: Google’s Globally-Distributed Database, to appear in OSDI 2012