Slides Aula20

Classificação de Padrões em Imagens
Classificação de Cenas e Segmentação Semântica
Jefersson Alex dos Santos
jefersson@dcc.ufmg.br
DCC
DCC029/868 - Processamento Digital de Imagens
Supervised Classification
Roteiro da Aula
1 Supervised Classification
2 Deep Vs Hand-Crafted Features

Strategies to Exploit ConvNets
Experimental Analysis
3 Segmentação Semântica
Context Window-Based Approach
Fully Convolutional Neural Network
2 / 46
Pattern Classifier
Typical Steps
Building
Multimedia Pattern
Feature Classifier
Dataset Representation Training
Extraction
3 / 46
Pattern Classifier
Typical Steps
Building
Multimedia Pattern
Feature Classifier
Extraction
Example
Artificial Neural Networks

Support Vector Machines
Nearest Neighbors
...
3 / 46
Pattern Classifier
Typical Steps
Building
Multimedia Pattern
Feature Classifier
Extraction
How to use the classifier?
3 / 46
Pattern Classifier
Typical Steps
Building
Multimedia Pattern
Feature Classifier
Extraction
Using
Multimedia Pattern Predicted

Feature Classifier Class
Object Representation
Extraction
3 / 46
K-Nearest Neighbor
4 / 46
K-Nearest Neighbor
4 / 46
K-Nearest Neighbor
4 / 46
K-Nearest Neighbor
4 / 46
K-Nearest Neighbor
4 / 46
Decision Trees
5 / 46
Decision Trees
5 / 46
What is the best classifier?
6 / 46
6 / 46
6 / 46
Deep Vs Hand-Crafted Features
Roteiro da Aula

7 / 46
Small Training Data
Challenges:
1 Deep learning needs large amount of data to train
2 Many applications (e.g. remote sensing) typically has small amount of annotated
data
8 / 46
Small Training Data
Challenges:
1 Deep learning needs large amount of data to train
2 Many applications (e.g. remote sensing) typically has small amount of annotated
data
Research Questions:
1 Is it possible to transfer features from every-day pics to the remote sensing
domain?
2 Do transferred features more effective than fully-trained?
3 How to better exploit deep learning in remote sensing data?
Reference
K. Nogueira, O. A. B. Penatti and J. A. dos Santos. Towards better exploiting convolutional neural networks for remote sensing scene classification. Pattern Recognition, 61,
539-556, 2017.
8 / 46
1 - Fully Training
Training from scratch
Target Dataset
9 / 46
1 - Fully Training
classification
Target Dataset
...
conv
conv
conv
fully
fully
9 / 46
1 - Fully Training
Random Initialized ConvNet Fully Trained on

the Target Dataset
classification
Target Dataset
...
conv
conv
conv
fully
fully
9 / 46
2 - Fine-Tuning
Random Initialized ConvNet
Original Dataset
classification
conv
...
conv
conv
fully
fully
10 / 46
2 - Fine-Tuning
Original Dataset
classification
conv
...
conv
conv
fully
fully
Target Dataset
10 / 46
2 - Fine-Tuning
Original Dataset
classification
conv
...
conv
conv
fully
fully
Target Dataset
...
conv
conv
conv
fully
fully
Fine-Tuning ConvNet
10 / 46
2 - Fine-Tuning
Original Dataset
classification
conv
...
conv
conv
fully
fully
Target Dataset
...
conv
conv
conv
fully
fully
Fine-Tuning ConvNet
10 / 46
2 - Fine-Tuning
Original Dataset
classification
conv
...
conv
conv
fully
fully
Transfer of
trained weights
Target Dataset
...
conv
conv
conv
fully
fully
Fine-Tuning ConvNet
10 / 46
2 - Fine-Tuning
Random.Initialized.ConvNet
Original.Dataset
classification
conv
...
conv
conv
fully
fully
Transfer.of. Different.number.of.
trained.weights classes⇒no.weight.
transfer
classification
Target.Dataset
...
conv
conv
conv
fully
fully
Fine-Tuning.ConvNet
10 / 46
2 - Fine-Tuning
RandomPInitializedPConvNet
OriginalPDataset
classification
conv
...
conv
conv
fully
fully
TransferPofP DifferentPnumberPofP
trainedPweights classes⇒noPweightP
transfer
classification
TargetPDataset
...
conv
conv
conv
fully
fully
Fine-TuningPConvNet
PossiblePLayersPtoP
Freeze
10 / 46
3 - Feature Extractor
Pre-trained ConvNet
classification
...
conv
conv
conv
fully
fully
11 / 46
Pre-trained ConvNet
classification
...
conv
conv
conv
fully
fully
deep feature
vector
11 / 46
Pre-trained ConvNet
classification
...
conv
conv
conv
fully
fully
deep feature classification
vector (SVM)
11 / 46
Datasets
Examples
UCMerced land-use RS19 Brazilian Coffee Scenes

Dataset Dataset Dataset
(a) Agricultural (c) Beach (e) Coffee
(b) Dense Residential (d) Football Field (f) Non-coffee
12 / 46
Some Experiments
A - Generalization Power Evaluation

B - Comparison of ConvNets Strategies
C - Comparison with Baselines
13 / 46
100
90
Average accuracy (%)
80
70
60
50
40
LCH
BIC
I
LAS
ACC
et
Ove sm k
BS 5k
VG et
G9
G9
G9
sm0k
Caf eNet
ha k
s k
SAS
Ale atL
Goo G16
Ove eatS
HO 20
HO 40
BD 810
BS 10
sm
BD 5
BD 1m
GIS
feN
xN
r Fe
HO
rF
gL
Feature representation
UCMerced Land-use Dataset
14 / 46
100
90
80
70
60
50
40
LCH
BIC
I
LAS
ACC
et
Ove sm k
BS 5k
VG et
G9
G9
G9
G9
ha 00
sm0k
Caf eNet
s k
SAS
Ale eatL
Goo G16
Ove eatS
HO 14
HO 20
HO 40
BD 180
BS 1m0
BD 5
BD 1m
GIS
feN
xN
s
HO
rF
rF
gL
RS19 Dataset
15 / 46
100
90
80
70
60
50
40
LCH
BIC
I
LAS
ACC
et
Ove sm k
BS 5k
BS 5k
VG et
G9
G4 9
sm0k
Caf Net
smk
SAS
Ale atL
Goo 16
Ove eatS
HO 20
BD 50
BS 10
sm
sw
GIS
G
BD 1
feN
xN
r Fe
HO
gL e
rF
Brazilian Coffee Scenes Dataset
16 / 46

Conclusions
It is possible to exploit feature representation learned in computer vision datasets

into the remote sensing scenario
Deep features generalize better to aerial dataset than to agricultural ones
Agricultural images are composed of finer and more homogeneous textures/color
17 / 46
18 / 46
RS19 Dataset
19 / 46
20 / 46
missclassified
Medium Residential −−−−−−−→ Dense Residential
into
21 / 46
missclassified
Commercial −−−−−−−→ Park
into
22 / 46

Conclusions
Feature representation learned in everyday image datasets can be adjusted to

the remote sensing domain
Fully training was not a good strategy maybe due to the small amount of labeled data
available
Fine tuning is usually the best strategy
Replacing the last softmax layer by SVM was a better solution
23 / 46
24 / 46
RS19 Dataset
25 / 46
26 / 46
Segmentação Semântica
Roteiro da Aula

27 / 46
Semantic Segmentation
(Aka: Pixelwise classification)
Convolutional Neural Networks (CNN) are the currently state-of-the-art in several

tasks
Bounding box detection
Keypoint detection
Image classification
Natural next step of CNN would be semantic segmentation
28 / 46
Semantic Segmentation
(Aka: Pixelwise classification)
Semantic segmentation: assigning to each pixel in an image a category-level

label
Fundamental task for total image understanding
Difficult because global information resolves “what” while local information
resolves “where”
29 / 46

context
window
pixel to be
classified
Reference
K. Nogueira, M. Dalla Mura, J. Chanussot, W. R. Schwartz, J. A. dos Santos. Learning to semantically segment high-resolution remote sensing images. In: 2016 23rd
International Conference on Pattern Recognition (ICPR), 2016, Cancun.
30 / 46
Final Segmentation Process
Original Context Probability

Image Windows Map
ConvNet
...
31 / 46
Datasets
Agriculture Dataset
(r) Image
(s) Ground-Truth
32 / 46
Some Results
Agriculture Dataset
True Positive True Negative False Positive False Negative

Relevance Maps
33 / 46
Fully-Convolutional Networks
Deep learning can learn global and local semantic information
Main contributions of the paper:
Fully Convolutional Neural Network (FCN)
Based on fine-tuning networks
Reference
J. Long, E. Shelhamer, T. Darrell. Fully convolutional networks for semantic segmentation. CVPR, p. 3431-3440, Boston, USA, 2015
34 / 46
Network with only convolutional layers (convolution plus pooling)

Output has spatial correspondence with the input
Trained end-to-end for semantic segmentation
No pre- or pos-processing at all
35 / 46

Convolutionization
Fully connected layer is not a special layer

Normal convolutional layer with specific parameters
Thus, every CNN can be transformed into a FCN
36 / 46

Convolutionization
37 / 46

Upsampling
Output has lower dimension than input image

Upsampling is needed to perform a correspondence
Deconvolution can be trained in a neural network
Makes the inverse of the convolution
38 / 46

Skips
Combines prediction layer with lower layers with finer strides

Combination makes local predictions that respect glocal structures
Overview:
Extracts features from middle layers
Add 1 × 1 convolution to perform class predicton
Fuse output
Deconvolution
39 / 46

Convolutionization
40 / 46
Experiments
Metrics
41 / 46
Results
Pascal VOC 2011,2012
42 / 46
Results
NYUDv2
43 / 46
Results
SIFT Flow Dataset
44 / 46
Results
Pascal VOC
45 / 46
Conclusion
FCN achieved state-of-the-art for semantic segmentation

Simultaneously simplifying and speeding up learning and inference
46 / 46

Slides Aula20

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Slides Aula20

Hochgeladen von

Copyright:

Verfügbare Formate

Classificação de Padrões em Imagens

Classificação de Cenas e Segmentação Semântica

Jefersson Alex dos Santos

2 Deep Vs Hand-Crafted Features

Artificial Neural Networks

How to use the classifier?

Multimedia Pattern Predicted

Support Vector Machines

What is the best classifier?

Support Vector Machines

Support Vector Machines

2 Deep Vs Hand-Crafted Features

Small Training Data

Small Training Data

Random Initialized ConvNet Fully Trained on

UCMerced land-use RS19 Brazilian Coffee Scenes

(a) Agricultural (c) Beach (e) Coffee

(b) Dense Residential (d) Football Field (f) Non-coffee

A - Generalization Power Evaluation

A - Generalization Power Evaluation

A - Generalization Power Evaluation

A - Generalization Power Evaluation

A - Generalization Power Evaluation

It is possible to exploit feature representation learned in computer vision datasets

B - Comparison of ConvNets Strategies

UCMerced Land-use Dataset

B - Comparison of ConvNets Strategies

B - Comparison of ConvNets Strategies

Brazilian Coffee Scenes Dataset

B - Comparison of ConvNets Strategies

B - Comparison of ConvNets Strategies

B - Comparison of ConvNets Strategies

Feature representation learned in everyday image datasets can be adjusted to

C - Comparison with Baselines

UCMerced Land-use Dataset

C - Comparison with Baselines

C - Comparison with Baselines

Brazilian Coffee Scenes Dataset

2 Deep Vs Hand-Crafted Features

Convolutional Neural Networks (CNN) are the currently state-of-the-art in several

Semantic segmentation: assigning to each pixel in an image a category-level

Context Window-Based Approach

Final Segmentation Process

Original Context Probability

True Positive True Negative False Positive False Negative

Fully Convolutional Neural Network

Network with only convolutional layers (convolution plus pooling)

Fully Convolutional Neural Network

Fully connected layer is not a special layer

Fully Convolutional Neural Network

Fully Convolutional Neural Network

Output has lower dimension than input image

Fully Convolutional Neural Network

Combines prediction layer with lower layers with finer strides

Fully Convolutional Neural Network

FCN achieved state-of-the-art for semantic segmentation

Das könnte Ihnen auch gefallen