Beruflich Dokumente
Kultur Dokumente
Outline
1. Unsupervised Segmentation
2.1. LDA
2.2. SIMCA
2.3. PLS-DA
SIMCA
PLS-DA
Classification concept:
Criterion:
Nordic - Italian
Beer - Wine
Introduction
Classification concept:
Variables
X (I x J)
Introduction
Variables
A
A
Samples
MODEL B
X (I x J) B
Class = f(X)
B
C
C
C
Introduction
Variables
MODEL
A
Class = f(X)
Unknown sample
(1 x J)
Introduction
Similarity:
Is the mathematical transposition of the concept of analogy. Analogy is used
in any moment of our life for pattern recognition, i.e. to recognize, to
distinguish, to classify.
Distances:
Are the starting point for evaluating similarity: close samples are considered
similar, far samples are considered dissimilar
Introduction
Classification
use the class information (supervised): they separate classes and their goal
is to find models able to correctly assign each sample to its proper class.
Linear LDA
Pure classification
QDA
Non-linear K-NN
Classification
SIMCA
Linear PLSDA
Class-modelling
Variations PLSDA
Non-linear ANN
SVM
Segmentation
SIMCA
PLS-DA Class-modelling
LDA
Clustering
Distances, similarity and clustering
Distances The distances are the starting points for evaluating similarity:
Close samples Similar samples
Far samples Dissimilar samples
Centroid
d-dimensions
Distances, similarity and clustering
2. Partitional methods
The KM algorithm assigns each pixel xmn of the image to the kth
cluster, whose center is nearest, by minimizing the sum of the
squared distances of each pixel to its corresponding center
Advantages
Simplicity
Drawbacks
Risk of converging to a local minimum in the iterations
Silhouette index
Calculated for each xmn pixel and offers a measure about the similarity between
points in the same cluster compared to points in other clusters:
EXTREMELY SLOW!!!
K-NN
- Dataset sample_demo.mat
2
1
Cluster
Cluster
Cluster
3
2
2
3 4
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Silhouette Value Silhouette Value Silhouette Value
K-NN
20 20 20
40 40 40
60 60 60
20 40 60 80 20 40 60 80 20 40 60 80
1
2 2
intensity
intensity
intensity
0
0 0
-1
-2 -2 -2
1200 1400 1600 1800 2000 1200 1400 1600 1800 2000 1200 1400 1600 1800 2000
wavelength wavelength wavelength
No of clusters = 5 No of clusters = 6 No of clusters = 7
K-NN
Centroids
No of clusters = 2 No of clusters = 3 No of clusters = 4
2 4 4
1
2 2
intensity
intensity
intensity
0
0 0
-1
-2 -2 -2
1200 1400 1600 1800 2000 1200 1400 1600 1800 2000 1200 1400 1600 1800 2000
wavelength wavelength wavelength
No of clusters = 5 No of clusters = 6 No of clusters = 7
Pure spectra
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
1200 1300 1400 1500 1600 1700 1800 1900 2000
Fuzzy Clustering
Advantages
Each pixel is assigned a belonging degree
Drawbacks
Risk of converging to a local minimum in the iterations (as in KM)
Partition Entropy
K-means
Fuzzy Clustering
FCM
Fuzzy Clustering
- Dataset Sample_demo.mat
- Dataset brunel.mat
Fuzzy Clustering
SIMCA
PLS-DA
Linear Discriminant Analysis (LDA)
Discriminant Analysis:
Separates samples into classes by finding directions which:
maximize the variance between classes
minimize the variance within classes
PC1 GOOD
LD1 GOOD
Linear Discriminant Analysis (LDA)
Discriminant Analysis:
Separates samples into classes by finding directions which:
maximize the variance between classes
minimize the variance within classes
PC1 BAD
LD1 GOOD
Linear Discriminant Analysis
Sg = S
Linear Discriminant Analysis
Drawbacks:
1) The number of samples must be higher than the number of variables. This
is not a real problem with images
SIMCA
SIMCA
PC1
PC1
PC2
SIMCA
PC1
PC2
SIMCA
PC1
PC2
SIMCA
PC1
PC2
SIMCA
?
residuals
Hotelling T2
SIMCA
where:
Qig and T2ig are the Hotellings T2 and Q calculated in the
PCA g-class model.
Q0.95,g and T20.95,g are the confidence intervals within 95%
of the g class
SIMCA
Hotelling T2
T20.95,g
Q0.95,g residuals
SIMCA
Hotelling T2
T20.95,g
Q0.95,g residuals
SIMCA
Hotelling T2
T20.95,g
Q0.95,g residuals
SIMCA
Drawbacks:
Unfolding
PLS-2 model
0
1
D Dummy
matrix
PLS-DA
PLS-2 model
0
1
D Dummy
matrix
PLS-DA
PLS-2 model
0
1
D Dummy
matrix
PLS-DA
Cross validation
Number of LVs
etc
Assessing the models
Confusion matrix
Assessing the models
Confusion matrix
TP True positive
FP False positive
FN False negative
TN True negative
Assessing the models
Confusion matrix
TP True positive
FP False positive
FN False negative
TN True negative
Assessing the models
Confusion matrix
TP True positive
FP False positive
FN False negative
TN True negative
Assessing the models
Confusion matrix
TP True positive
FP False positive
FN False negative
TN True negative
Assessing the models
Confusion matrix
TP True positive
FP False positive
FN False negative
TN True negative
Assessing the models
ALMONDS
PLASTICS