Beruflich Dokumente
Kultur Dokumente
Christian
Szegedy,
Wei
Liu,
UNC
Yangqing
Jia,
Pierre
Sermanet,
Scott
Reed,
Dragomir
Anguelov,
University of
Michigan
Dumitru
Erhan,
Google
Vincent
Vanhoucke,
Google
Andrew
Rabinovich,
Google
Well..
201
Revolutionizing computer vision since 1989
2
?????
GoogLeNet
Convolution
Pooling
Softmax
Other
GoogLeNet
Convolution
Pooling
Softmax
Other
Zeiler-Fergus Architecture (1 tower)
Vanishing gradient?
Exploding gradient?
Tricky weight initialization?
Vanishing gradient?
Exploding gradient?
Tricky weight initialization?
Justified Questions
Justified Questions
U
L
e
R
Theoretical breakthroughs
Arora, S., Bhaskara, A., Ge, R., & Ma, T.
Provable bounds for learning some deep
representations.
ICML 2014
Theoretical breakthroughs
Arora, S., Bhaskara, A., Ge, R., & Ma, T.
Provable bounds for learning some deep
representations.
!
s
e
n
o
x
ICML 2014
nv e
n
Eve
co
n
no
Hebbian Principle
Input
Layer 1
Input
Layer 2
Layer 1
Input
Layer 2
Layer 1
Input
1x1
1x1
1x1
3x3
1x1
3x3
1x1
3x3
5x5
1x1
3x3
5x5
1x1
Filter
concatenation
3x3
1x1
convolutions
3x3
convolutions
5x5
Previous layer
5x5
convolutions
Naive idea
Filter
concatenation
1x1 convolutions
3x3 convolutions
Previous layer
5x5 convolutions
1x1 convolutions
3x3 convolutions
Previous layer
5x5 convolutions
Inception module
Filter
concatenation
3x3 convolutions
5x5 convolutions
1x1 convolutions
1x1 convolutions
1x1 convolutions
1x1 convolutions
Previous layer
Inception
Convolution
Pooling
Softmax
Other
Inception
9 Inception modules
Network in a network in a network...
Convolution
Pooling
Softmax
Other
Inception
256
480
480
512
512
512
832
832
1024
Width of inception modules ranges from 256 filters (in early modules) to 1024 in top inception
modules.
Inception
256
480
480
512
512
512
832
832
1024
Width of inception modules ranges from 256 filters (in early modules) to 1024 in top inception
modules.
Can remove fully connected layers on top completely
Inception
256
480
480
512
512
512
832
832
1024
Width of inception modules ranges from 256 filters (in early modules) to 1024 in top inception
modules.
Can remove fully connected layers on top completely
Number of parameters is reduced to 5 million
Inception
256
480
480
512
512
512
832
832
1024
Width of inception modules ranges from 256 filters (in early modules) to 1024 in top inception
modules.
Can remove fully connected layers on top completely
Number of parameters is reduced to 5 million
Number of
Crops
Comput
ational
Cost
Top5
Erro
r
Com
pare
d to
Bas
e
1
(center
crop)
1x
10
.0
7
%
10*
10x
9. 15 0.9
% 2
%
144
(Our
approa
144x 7. 89 2.1
% 8
Number of
Crops
Comput
ational
Cost
Top5
Erro
r
Com
pare
d to
Bas
e
1
(center
crop)
1x
10
.0
7
%
10*
10x
9. 15 0.9
% 2
%
144
(Our
144x 7. 89 2.1
6.54%
16.4 no
%
15.3 Ima
%
geN
et
22k
Clari 201 -
11.7 no
Detection
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2013). Rich feature
hierarchies for accurate object detection and semantic
segmentation. arXiv preprint arXiv:1311.2524.
Detection
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2013). Rich feature
hierarchies for accurate object detection and semantic
segmentation. arXiv preprint arXiv:1311.2524.
Improved proposal generation:
Increase size of super-pixels by 2X
coverage 92%
90%
number of proposals: 2000/image
1000/image
Detection
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2013). Rich feature
hierarchies for accurate object detection and semantic
segmentation. arXiv preprint arXiv:1311.2524.
Improved proposal generation:
Increase size of super-pixels by 2X
coverage 92%
90%
number of proposals: 2000/image
1000/image
Add multibox* proposals
coverage 90%
93%
number of proposals: 1000/image
1200/image
*Erhan, D., Szegedy, C., Toshev, A., & Anguelov, D.
Scalable Object Detection using Deep Neural Networks.
CVPR 2014
Detection
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2013). Rich feature
hierarchies for accurate object detection and semantic
segmentation. arXiv preprint arXiv:1311.2524.
Improved proposal generation:
Increase size of super-pixels by 2X
coverage 92%
90%
number of proposals: 2000/image
1000/image
Add multibox* proposals
coverage 90%
93%
number of proposals: 1000/image
1200/image
Improves mAP by about 1% for single model.
*Erhan, D., Szegedy, C., Toshev, A., & Anguelov, D.
Scalable Object Detection using Deep Neural Networks.
CVPR 2014
mAP
externa
l data
con
text
ual
mo
del
boun
dingbox
regre
ssion
Trimps- 31.6
Soushe %
n
ILSVR
C12
Classif
ication
no
Berkele 34.5
y
%
Vision
ILSVR
C12
Classif
ication
no ye
s
UvA35.4
Euvisio %
n
ILSVR
C12
Classif
ication
Y
e
a
r
P m ext ens
l A ern em
a P al ble
c
dat
e
a
con app
text roa
ual ch
mo
del
Uv
AEuv
isio
n
2
0
1
3
12
s 2
t .
6
%
non
e
ye Fis
s he
r
ve
cto
rs
De
ep
Insi
2
0
1
34
r 0
d.
ILS
VR
C12
Cla
3 ye Co
m s nv
od
Ne
table lamp
lamp shade
printer
projector
desktop computer
laptop
hair drier
binocular
ATM machine
seat belt
Acknowledgments
We would like to thank:
Chuck Rosenberg, Hartwig
Adam, Alex Toshev, Tom
Duerig, Ning Ye, Rajat Monga,
Jon Shlens, Alex Krizhevsky,
Sudheendra Vijayanarasimhan,
Jeff Dean, Ilya Sutskever,
and check out our poster!
Andrea Frome