Block 07 Segmentation Classification

Segmentation/Classification
Outline
1. Unsupervised Segmentation
1.1. K-NN clustering

1.2. Fuzzy clustering
2. Supervised Segmentation. Classification
2.1. LDA
2.2. SIMCA
2.3. PLS-DA
3. Assesing the classification models

Linear Classification Methods
Outline
Introduction. Concept of classification, distances and clustering
K-Nearest Neighbors (K-NN)
Linear Discriminant Analysis (LDA)
SIMCA
PLS-DA
Assesing the classification models / Validation

Introduction
Classification concept:
Classification aims at finding a criterion to assign an object (sample)

to one category (class) based on a set of measurements performed on
the object itself.
Category or class is a (ideal) group of objects sharing similar

characteristics
In classification, the categories are defined a priori
Classification stablishes boundaries depending of the criterion

selected
Introduction
Classification concept: (inspired by Federico Marini)
Criterion:
Nordic - Italian
Beer - Wine
Introduction
Classification concept:
Variables
The classification strongly depends on

the representativeness of the measured
Samples
variables with respect to the samples
X (I x J)
Introduction
Classification methods: Chemometric techniques aimed at finding

mathematical models able to recognize the membership of each sample to
its proper class on the basis of a set of measurements (X).
Introduction
Classification methods: Chemometric techniques aimed at finding

mathematical models able to recognize the membership of each sample to
its proper class on the basis of a set of measurements (X).
Introduction
Classification methods: Distinctions can be made among classification

techniques on the basis of the mathematical form of the decision boundary,
i.e. on the basis of the ability to detect linear or non-linear boundaries
Introduction
Classification methods: Another important distinction can be made among

pure classification and class-modeling methods
Introduction
Classification methods: Classes can be defined in different ways:
- By theoretical knowledge or experimental evidences
- By Discretizing a quantitative response
< 5 Class A >= 5.1 7 < Class B >= 7.1 Class C

Introduction
Classification methods: Once the classes have been defined, we construct

a model
Variables
A
A
Samples
MODEL B
X (I x J) B
Class = f(X)
B
C
C
C
Introduction
Classification methods: And then we can predict the category of unknown

samples
Variables
MODEL
A
Class = f(X)
Unknown sample
(1 x J)
Introduction
Similarity:
Is the mathematical transposition of the concept of analogy. Analogy is used
in any moment of our life for pattern recognition, i.e. to recognize, to
distinguish, to classify.
Distances:
Are the starting point for evaluating similarity: close samples are considered
similar, far samples are considered dissimilar
Introduction
Distances between two samples

Introduction
Clustering Cluster methods search for the presence of groups (clusters)

in the data, based on distances. UNSUPERVISED
Introduction
Do not confuse clustering and classification

Clustering methods
search for the presence of groups (clusters) in the data. They are
unsupervised and based on calculation of distances.
Classification
use the class information (supervised): they separate classes and their goal
is to find models able to correctly assign each sample to its proper class.
The easiest classification method (kNN) is also based on distance

measures
Introduction
Do not confuse clustering and classification
Unsupervised Pattern Recognition

PCA, clustering
Linear LDA
Pure classification
QDA
Non-linear K-NN
Classification
SIMCA
Linear PLSDA
Class-modelling
Variations PLSDA
Non-linear ANN
SVM
Segmentation
Main goal Find groups of similar pixels in an image
1. Supervised segmentation techniques A priori knowledge. Classification
SIMCA
PLS-DA Class-modelling
LDA
2. Unsupervised segmentation techniques. Not classification methods

PCA
Clustering
Distances, similarity and clustering
Similarity It is the mathematical word for analogy.
Distances The distances are the starting points for evaluating similarity:
Close samples Similar samples
Far samples Dissimilar samples
Centroid
Any spectrum of the

image
( )
d-dimensions
Clustering Cluster methods search for the presence of groups (clusters)

in the data, based on distances. UNSUPERVISED
Two main methods:

1. Agglomerative or hierarchical methods
Each pixel is considered as a class

Decreasing the number of classes-clusters
Many methods to solve the problem
2. Partitional methods
Preselection of number of clusters
But this selection is not easy

Hard Clustering KNN
Two main methods
Fuzzy Clustering FCM
K-NN
K-NN It is the benchmark method for unsupervised classification based

on measuring distances (analogy simmilarity).
Each sample is classified on the basis of the most represented classes of

the k nearest samples.
K-NN
K-means (KM) Variation of K-NN
The KM algorithm assigns each pixel xmn of the image to the kth
cluster, whose center is nearest, by minimizing the sum of the
squared distances of each pixel to its corresponding center
each pixel centroid

K-NN
How does it work?

(1) Choose the number of clusters k
(2) Generate k clusters and determine the cluster centers
(3) Assign each pixel to the nearest cluster center
(4) Calculate J and recalculate new cluster centers
(5) Repeat 3 and 4 till convergence criterion.

K-NN
Advantages
Simplicity
Drawbacks
Risk of converging to a local minimum in the iterations
It forces to each pixel to belong exclusively to one cluster.
No. Clusters Silhouette index

K-NN
Silhouette index
Calculated for each xmn pixel and offers a measure about the similarity between
points in the same cluster compared to points in other clusters:
amn average distance between each

mnth pixel and all the pixels included in
the same cluster
bmn minimum average distance

between each mnth pixel and the pixels
included in other clusters
(Smn)k close to 1 Correct classification

(Smn)k with negative value Missclasification
EXTREMELY SLOW!!!
K-NN
Exercise 0701 Applying K-nearest
- Dataset sample_demo.mat
Silhouette 2 clusters Silhouette 3 clusters Silhouette 4 clusters
2
1
Cluster
Cluster
Cluster
3
2
2
3 4
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Silhouette Value Silhouette Value Silhouette Value
K-NN
surfaces 2 clusters surfaces 3 clusters surfaces 4 clusters
20 20 20
40 40 40
60 60 60
20 40 60 80 20 40 60 80 20 40 60 80
No of clusters = 2 No of clusters = 3 No of clusters = 4

2 4 4
1
2 2
intensity
intensity
intensity
0
0 0
-1
-2 -2 -2
1200 1400 1600 1800 2000 1200 1400 1600 1800 2000 1200 1400 1600 1800 2000
wavelength wavelength wavelength
K-NN
Centroids
2 4 4
1
2 2
intensity
intensity
intensity
0
0 0
-1
-2 -2 -2
1200 1400 1600 1800 2000 1200 1400 1600 1800 2000 1200 1400 1600 1800 2000
wavelength wavelength wavelength
Pure spectra
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
1200 1300 1400 1500 1600 1700 1800 1900 2000
Fuzzy Clustering
Each pixel is assigned a fractional degree of membership to all the

clusters simultaneously, rather than it belonging completely to one
cluster (as in KM clustering).
FCM allows pixels in the edge to belong to one cluster to a lesser

degree than pixels that are in the middle of the cluster.
Fuzzy Clustering
Membership function coefficient ugmnk is calculated for each xmn pixel in

such a way that each coefficient is compressed between 0 and 1, and
the sum of all the coefficients is defined to be 1
g is the so-called fuzzifier constant, which determines the fuzziness

of the clustering result
A good value of g is 2, which indicates that the coefficients are

linearly normalized to make this sum 1
Fuzzy Clustering
Now, the J function is calculated as follows:
and cluster center mk is calculated as the weighted mean:

Fuzzy Clustering
Advantages
Each pixel is assigned a belonging degree
Drawbacks
Risk of converging to a local minimum in the iterations (as in KM)
No. Clusters Partition Entropy

Fuzzy Clustering
Partition Entropy
mnk and MN denotes the total amount of pixels of the image.
PE value ranges from 0 to log(K).
Values close to 0 indicate a good estimation of the number of

clusters, whereas PE values close to log K indicate that the number
of clusters does not reflect the real structure of the image.
Fuzzy Clustering
Partition Entropy and silhouette are dreadfully
HOPELESS IN SOME CASES!!!

Fuzzy Clustering
Silhoueltte Partition Entropy

Fuzzy Clustering
K-means
Fuzzy Clustering
FCM
Fuzzy Clustering
Exercise 0702 Apply K-means and FCM to both
- Dataset Sample_demo.mat
- Dataset brunel.mat
Fuzzy Clustering
Exercise 0703 Apply K-means to the fluorescence and

plastics
Classification Classification uses the class information to find models

that associate each sample to the assigned class SUPERVISED
Linear Discriminant Analysis
SIMCA
PLS-DA
Discriminant Analysis:
Separates samples into classes by finding directions which:
maximize the variance between classes
minimize the variance within classes
PC1 GOOD
LD1 GOOD
Discriminant Analysis:
Separates samples into classes by finding directions which:
maximize the variance between classes
minimize the variance within classes
PC1 BAD
LD1 GOOD
Linear discriminant Analysis (LDA)
LDA is a method that separate samples into classes by finding directions

which maximize the variance between classes and minimize the variance
whithin classes.
Two main assumptions of our data
Each k-class density is a multivariate gaussian

Linear discriminant Analysis (LDA)
LDA is a method that separate samples into classes by finding directions

which maximize the variance between classes and minimize the variance
whithin classes.
Two main assumptions of our data
All the class covariance matrices are presumed to be identical
Sg = S
LDA, then calculates the probability of belonging to each class by

applying the Bayes Theorem:
Once the probability has been calculated, LDA assigns samples to an

especific class with minimum discriminant score dg:


Benefits and drawbacks of LDA:
Benefits: It is easy to use and robust
Drawbacks:
1) The number of samples must be higher than the number of variables. This
is not a real problem with images
2) LDA makes the assumption of the gaussian distribution of the data

Soft Independent Method for Class Analogy
Standard Isolinear Method of Class Assignment
SIMCA
SIMCA
SIMCA
Originally proposed by Wold in 1976
SOFT: No assumption of the distribution of variable is made

(bilinear modeling)
INDEPENDENT: Each category is modeled independently
MODELING of CLASS ANALOGIES: Attention is focused on the

similarity between object from the same class rather then on
differentiating among classes.
SIMCA
SIMCA
Is a class-modelling method. Thus, it is a supervised method.

- A training set is needed to construct a model
- Projection of unknown samples to the model
SIMCA is based on making independent PCA models for each class

in the training set, having each class the possibility of containing
different number of PCs.
After the independent PCA models are constructed, the unknown

samples are projected onto them
SIMCA
SIMCA. Graphical interpretation
1) Individual PCA model for each class. Each class pre-processed

independently
PC1
PC1
PC2
SIMCA
1) Individual PCA model for each class. Each class pre-processed

independently
PC1
PC2
SIMCA
2) Projection of other classes in the PCA class space of one class
PC1
PC2
SIMCA
SIMCA. How to assign the class belonging?
There are many implementations to decide if an unknown sample

belongs to a class or not.
Here we will talk about three variations of the same concept:
Hottelling T2 and residuals!
PC1
PC2
SIMCA
SIMCA. How to assign the class belonging?
There are many implementations to decide if an unknown sample

belongs to a class or not.
Here we will talk about three variations of the same concept:
?
residuals
Hotelling T2
SIMCA
SIMCA. Assignment of class. Strategy 1
SIMCA assigns samples to the nearest class.

Samples are, therefore, always assigned.
The distance of each i sample from each g class (dig) is calculated as
where:
Qig and T2ig are the Hotellings T2 and Q calculated in the
PCA g-class model.
Q0.95,g and T20.95,g are the confidence intervals within 95%
of the g class
SIMCA
SIMCA. Assignment of class. Strategy 1
Hotelling T2
T20.95,g
Q0.95,g residuals
SIMCA
SIMCA. Assignment of class. Strategy 2. Condition
SIMCA assigns a sample to the g class if
Which is equivalent to:
Samples can be unclassified:

unassigned (i.e. outside the class spaces of all classes)
classified in more than one class (confused)
SIMCA
Hotelling T2
T20.95,g
Q0.95,g residuals
SIMCA
SIMCA. Assignment of class. Strategy 3.
SIMCA assigns a sample to the g class if
Similar approach to the second strategy

Samples can be unclassified:
unassigned (i.e. outside the class spaces of all classes)
classified in more than one class (confused)
SIMCA
Hotelling T2
T20.95,g
Q0.95,g residuals
SIMCA

SIMCA
SIMCA. Benefits and drawbacks:
Benefits: Simple and based on PCA
Drawbacks:
1) Each class needs to be perfectly defined by the number of PCs.
2) The number of unassigned samples can be high.
3) One sample can belong to several classes

PLS-DA
Partial Least Squares Discriminant Analysis
Unfolding
PLS-2 model
0
1
D Dummy
matrix
PLS-DA

PLS-DA is based on the same principles than PLS Covariance
between X and Y
PLS-2 model
0
1
D Dummy
matrix
PLS-DA

The main difference is that Y is a dummy matrix with 0 and 1
PLS-2 model
0
1
D Dummy
matrix
PLS-DA

The response of PLS-DA when classifies is still a number. Therefore
we need to find rules to convert these numbers into classes
PLS-DA

Bayes Theorem (like in LDA):
1) It assumes that the predicted values follow a normal distribution.

2) The treshold is selected where number of false positives and
false negatives is minimized
PLS-DA

Bayes Theorem (like in LDA):
PLS-DA

The rest, it works like PLS:
Cross validation
Number of LVs
etc
Assessing the models
Confusion matrix
Confusion matrix
TP True positive
FP False positive
FN False negative
TN True negative
Confusion matrix
TP True positive
FP False positive
FN False negative
TN True negative
Confusion matrix
TP True positive
FP False positive
FN False negative
TN True negative
Confusion matrix
TP True positive
FP False positive
FN False negative
TN True negative
Confusion matrix
TP True positive
FP False positive
FN False negative
TN True negative
Exercise 0704 Calculate all the statistical

parameters of the following confusion matrix
Receiver Operating Characteristics (ROC curves)

ROC curves is a graphical plot of Sp and Sn as X and Y axes respectively,
for a binary classification system as its discrimination threshold is changed.
They are used to estimate the best classification score.


Exercise 0705 PLS-DA model to:
ALMONDS
PLASTICS

Block 07 Segmentation Classification

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Block 07 Segmentation Classification

Hochgeladen von

Copyright:

Verfügbare Formate

Segmentation/Classification

1.1. K-NN clustering

2. Supervised Segmentation. Classification

3. Assesing the classification models

Introduction. Concept of classification, distances and clustering

K-Nearest Neighbors (K-NN)

Linear Discriminant Analysis (LDA)

Assesing the classification models / Validation

Classification aims at finding a criterion to assign an object (sample)

Category or class is a (ideal) group of objects sharing similar

In classification, the categories are defined a priori

Classification stablishes boundaries depending of the criterion

Classification concept: (inspired by Federico Marini)

The classification strongly depends on

variables with respect to the samples

Classification methods: Chemometric techniques aimed at finding

Classification methods: Chemometric techniques aimed at finding

Classification methods: Distinctions can be made among classification

Classification methods: Another important distinction can be made among

Classification methods: Classes can be defined in different ways:

- By theoretical knowledge or experimental evidences

- By Discretizing a quantitative response

< 5 Class A >= 5.1 7 < Class B >= 7.1 Class C

Classification methods: Once the classes have been defined, we construct

Classification methods: And then we can predict the category of unknown

Distances between two samples

Clustering Cluster methods search for the presence of groups (clusters)

Do not confuse clustering and classification

The easiest classification method (kNN) is also based on distance

Do not confuse clustering and classification

Unsupervised Pattern Recognition

Main goal Find groups of similar pixels in an image

1. Supervised segmentation techniques A priori knowledge. Classification

2. Unsupervised segmentation techniques. Not classification methods

Similarity It is the mathematical word for analogy.

Any spectrum of the

Clustering Cluster methods search for the presence of groups (clusters)

Two main methods:

Each pixel is considered as a class

Many methods to solve the problem

Preselection of number of clusters

But this selection is not easy

K-NN It is the benchmark method for unsupervised classification based

Each sample is classified on the basis of the most represented classes of

K-means (KM) Variation of K-NN

each pixel centroid

How does it work?

(2) Generate k clusters and determine the cluster centers

(3) Assign each pixel to the nearest cluster center

(4) Calculate J and recalculate new cluster centers

(5) Repeat 3 and 4 till convergence criterion.

It forces to each pixel to belong exclusively to one cluster.

No. Clusters Silhouette index

amn average distance between each

bmn minimum average distance

(Smn)k close to 1 Correct classification

Exercise 0701 Applying K-nearest

Silhouette 2 clusters Silhouette 3 clusters Silhouette 4 clusters

surfaces 2 clusters surfaces 3 clusters surfaces 4 clusters

No of clusters = 2 No of clusters = 3 No of clusters = 4

Each pixel is assigned a fractional degree of membership to all the

FCM allows pixels in the edge to belong to one cluster to a lesser

Membership function coefficient ugmnk is calculated for each xmn pixel in

g is the so-called fuzzifier constant, which determines the fuzziness