Beruflich Dokumente
Kultur Dokumente
10
10
apple
banana
pear
8
6
4
2
0
2
4
6
8
10
Introduction
The aim of this set of exercises is to assist the reader in getting acquainted with PRTools,
a Matlab toolbox for pattern recognition. It is a prerequisite to have a global knowledge on
pattern recognition, to have read the introductory part of the PRTools manual and to have
access to this manual during the study of the exercises. Moreover, the reader needs to have
some experience with Matlab and should regularly study the help texts provided with the
PRTools commands (e.g. help gendatc).
The exercises should give some insight into the toolbox. They are not meant to explain in
detail how the tools are constructed and, thereby, they do not reach the level that enables the
student to add new tools to PRTools, using its specific classes dataset and mapping.
It is left to the responsibility of the reader to study the exercises using various datasets. They
can be either generated by one of the routines in the toolbox or they should be loaded from a
special dataset directory. In section 13 this is explained further with examples of both, artificial
data as well as real world data. First the Matlab commands are given, next scatter plots of
some of the sets are shown. Note that not all the arguments in the commands presented are
compulsory. It is necessary to refer to these pages regularly in order to find suitable problems
for the exercises.
In order to build pattern recognition systems for real world (raw) datasets, e.g. images as
they are grabbed by a camera, preprocessing and the measurement of features is necessary.
The growing measurement toolbox MeasTools is designed for that. Here it is unavoidable
that students write their own low level routines as at this moment the collection of feature
measuring tools is insufficient. As no MeasTools manual is available yet, students should
read the online documentation and the additional material which may be supplied during a
course.
Dont forget to study the exercises presented in the manual and examples available under
PRTools (e.g. prex_cleval)!
**************************
The exercises assume that the data collections prdatasets, prdatafiles and Coursedata
are available. The last directory contains also some experimental commands not available in
the standard PRTools distribution.
Version 4.1 of PRTools contains some new facilities that may confuse the user. The
prprogress command controls the reporting of long running commands. It may irritate
the user and may even sometimes crash (especially in the Java interface). It may be switched
of by prprogress off.
Some commands may generate many warnings, especially when no class priors are set in a
dataset. One solution is to switch of the PRTools warning mechanism by prwarning(0).
A better way is to set class priors. Eg. a = setprior(a,getprior(a)) sets the class priors
according to class frequencies if they are not yet defined.
Contents
1 Introduction
2 Classifiers
10
16
18
23
26
29
34
9 One-Class Classifiers
36
10 Classifier combining
39
11 Boosting
42
44
13 Summary of the methods for data generation and available data sets
47
Introduction
Example 1. Datasets
PRTools entirely deals with sets of objects represented by vectors in a feature space. The
central data structure is a so-called dataset. It consist of a matrix of size m k; m row
vectors representing the objects given by k features each. Attached to this matrix is a set of
m labels (strings or numbers), one for each object and a set of k feature names (also strings
or numbers), one for each feature. Moreover, a set of prior probabilities, one for each class, is
stored. Objects with the same label belong to the same class. In most help files in PRTools,
a dataset is denoted by A. Almost all routine can handle multi-class objects. Some useful
routines to handle datasets are:
dataset
gendat
genlab
seldat
setdat
getdata
getlab
getfeat
renumlab
Sets of objects may be given externally or may be generated by one of the data generation
routines in PRTools (see section 13). Their labels may be given externally or may be the
results of a classification or a cluster analysis. A dataset containing 10 objects with 5 random
measurements can be generated by:
>> data = rand(10,5);
>> a = dataset(data)
10 by 5 dataset with 0 classes: [ ]
In this example no labels are supplied, therefore no classes are detected. Labels can be added
to the dataset by:
>> labs = [1 1 1 1 1 2 2 2 2 2]; % labs should be a column vector
>> a = dataset(a,labs)
10 by 5 dataset with 2 classes: [5 5]
Note that the labels have to be supplied as a column vector. A simple way to assign labels to
a dataset is offered by the routine genlab in combination with the Matlab char command:
>> labs = genlab([4 2 4],char(apple,pear,banana))
>> a = dataset(a,labs)
10 by 5 dataset with 3 classes: [4 4 2]
Note that the order of the classes has changed. Use the routines getlab and getfeat to
retrieve the object labels and the feature labels of a. The fields of a dataset can be made
[10x5 double]
[3x6 char]
[10x1 double]
crisp
[]
[5x1 double]
[] [] [] [] []
[]
[]
10
5
10x1 cell
[1x1 struct] 05-Apr-2005 18:57:19
[]
[]
In the on-line information on datasets (help datasets, also printed in the PRTools manual)
the meaning of these fields is explained. Each field may be changed by a set-command, e.g.
>> b = setdata(a,rand(10,5));
Field values can be retrieved by a similar get-command, e.g.
>> classnames = getlablist(a)
In nlab an index is stored for each object to the list of class names lablist. Note that this
list is alphabetically ordered. The size of a dataset can be found by both, size and getsize:
>> [m,k] = size(a);
>> [m,k,c] = getsize(a);
The number of objects is returned in m, the number of features in k and the number of classes
in c. The class prior probabilities are stored in prior. It is by default set to the class
frequencies if the field is empty. Data in a dataset can also be retrieved by double(a) or more
simple by +a.
1.1 Have a look of the help-information of seldat. Notice that it has many input parameters.
In most cases you can ignore input parameters of functions that are of no interest to you. The
default values are often good enough. Use the routine to extract the banana class from a and
check this by inspecting the result of +a.
Datasets can be manipulated in many ways comparable with Matlab matrices. So [a1; a2]
combines two datasets, provided that they have the same number of features. The feature set
may be extended by [a1 a2] if a1 and a2 have the same number of objects.
1.2 Generate 3 new objects of the classes apple and pear and add them to the dataset
a. Check if the class sizes change accordingly.
1.3 Add a new, 6th feature to the whole dataset a.
Another way to inspect a dataset is to make a scatterplot of the objects in the dataset. For
this the function scatterd is supplied. This plots each object in a dataset in a 2D graph,
using a coloured marker when class labels are supplied. When more than two features are
present in the dataset, only the first two are used. For obtaining a scatterplot of two other
features they have to be explicitly extracted first, e.g. a1 = a(:,[2 5]);. With an extra
option legend one can add a legend to the figure, showing which markers indicate which
classes.
1.4 Use scatterd to make a scatterplot of the features 2 and 5 of dataset a. Try it also
using the legend option.
1.5 Next, use scatterdui to make a scatterplot of a and use its buttons to select features.
(Note that legend is not a valid option here.)
1.6 It is also possible to create 3D scatterplots. Make a 3-dimensional scatterplot by
scatterd(a,3) and try to rotate it by the mouse after pressing the right toolbar button.
1.7 Use one of the procedures described on page 42 and following to create an artificial
dataset of 100 objects. Make a scatterplot. Repeat this a few times.
Exercise 1. Scatterplot
Load the 4-dimensional Iris dataset by a = iris and make scatterplots of all feature combinations using the gridded option of scatterd. Try also all feature combination using
scatterdui.
Plot in a separate figure the one-dimensional feature densities by plotf. Identify visually
the best combination of two features. Create a new dataset b that contains just these two
features. Create a new figure by the figure command and plot a scatterplot of b.
Exercise 2. Mahalanobis distance (optional)
Use the distmaha command to compute the Mahalanobis distances between all pairs of classes
in the iris dataset. Repeat this for the best two features just selected. Can you find a way
to test whether this is really the best feature pair according the Mahalanobis distance?
Exercise 3. Generate your own dataset (optional)
Generate a dataset that consists of two 2-D uniformly distributed classes of objects using the
rand command. Transform the sets such that for the [xmin xmax; ymin ymax] intervals
the following holds: [0 2; -1 1] for class 1 and [1 3; 1.5 3.5] for class 2. Generate
50 objects for each class. An easy way is to do this for x and y coordinates separately and
combine them afterwards. Label the features by area and perimeter.
Check the result by scatterd and by retrieving object labels and feature labels.
Exercise 4. Enlarge an existing dataset (optional)
Generate a dataset using gendatb containing 10 objects per class. Enlarge this dataset to
100 objects per class by generating more data using the gendatk and gendatp commands.
Compare the scatterplots with a scatterplot of 100 objects per class directly generated by
gendatb. Explain the difference.
Example 2. Density estimation
Normal distribution
Parzen density estimation
K-nearest neighbour density estimation
They are programmed as a mapping. Details of mappings are discussed later. The following
two steps are always essential for a mapping: the estimation is built, or trained, using a
training set, e.g. by:
>> a = gauss(100)
Gaussian Data, 100 by 1 dataset with 1 classes: [100]
Which is a 1-dimensional normally distributed dataset of 100 points with mean 0.
>> w = gaussm(a)
Mixture of Gaussians, 1 to 1 trained mapping --> normal map
The trained mapping w now contains all information needed for computing densities of given
points, e.g.
>> b = [-2:0.1:2];
Now we will measure for the points defined by b the density according to w (which is a density
estimator based on the dataset a):
>> d = map(b,w) 41 by 1 dataset with 0 classes: [41]
The result may be listed on the screen by [+b +d] (coordinates and densities) or plotted by:
>> plot(+b,+d)
2.1 Plot the densities estimated by parzenm and knnm in separate figures. These routines
need sensible parameters. Try a few values for the smoothing parameter and the number of
nearest neighbours.
Example 3. Create a dataset from a set of images
Load an image dataset, e.g. kimia. Use the struct command to inspect its featsize field.
As this dataset consists of object images (each object in the dataset is an image) the image
sizes have to be known and are stored in this field. Use the show command to visualize this
image datasets.
The immoments command generates out of a dataset with object images a set of moments as
features. Compute the Hu moments and study the scatterplot by scatterdui.
Exercise 5. Compute image features
Some PRTools command operate on images stored in datasets, see help prtools. A command like datfilt and dataim may be used to transform object images. Think of a way
to compute the area and the contour length of the blobs in the kimia dataset. Display the
scatterplot.
Exercise 6. Density plots (optional)
Generate a 2-dimensional 2-class dataset by gendatb of 50 points per class. Estimate the
densities by each of the methods from Example 2.
Make in three figures a 2D scatterplot by scatterd. Different from the above 1-dimensional
example, a ready made density plotting routine plotm can be used for drawing iso-density
lines in the scatterplot. Plot them on three figures by using command plotm(w). Try also
3-d plots by plotm(w,3). Note that plotm always needs first a scatterplot to find the domain
where the density has to be computed.
Exercise 7. Nearest Neighbor Classification (optional)
Write your own function for the nearest neighbour error estimation: e = nne(d) in which
the incoming parameter d is a labeled distance matrix obtained by d = distm(b,a), where
a and b are labeled datasets. The objects of datasets a and b should be represented in the
same feature space. The resulting d is again a dataset. The objects of d are represented
by distances between b and a. Labels of d can be retrieved by object lab = getlab(d),
features by feat lab = getfeat(d).
By the definition of the nearest neighbour rule, the label of each object in the test set has to
be compared with the label of its nearest neighbour in the training set. In this exercise a (b)
is playing a role of a training (test) set. The number of differences between two label sets can
be counted by n = nlabcmp(object lab,feat lab).
The nne routine thereby has the following steps:
1. Create a vector L with as many elements as d has objects. L(i) = j, where j is the
index of the nearest neighbour of row object i. This index of the closest object can
be found by [dd,j] = min(d(i,:));
2. Use nlabcmp to count the differences between the true labels of the objects corresponding to the rows given by object lab and the labels of the nearest neighbours
feat lab(L,:).
3. Normalise and return the error.
4. If the training set a and the test set b are identical (e.g. d = distm(a,a)), nne should
return 0 because each object is its own nearest neighbour. Modify your routine in such
a way that it returns the leave-one-out error if it is called by e = nne(d,loo).
The leave-one-out error is the error made on a set of objects if for each object under
consideration the object itself is excluded from the set at the moment it is evaluated.
In this case not the smallest d(i,j) on row i has to be found (which should be on the
diagonal), but the next one.
Inspect some 2D datasets by scatterd and estimate the nearest neighbour error by nne.
Running Exercise 1. NIST Digits
Several datasets of of handwritten digits are available. The command nist32 loads binary
images of size 32x32 as a dataset of 1024 features. In several ways features can be extracted,
e.g. immoments computes by default the coordinates of the mean.
Load a dataset A of four digits, e.g. 0,3,5 and 8. Create a subset B with 25 objects per class.
Use the show command to visualize this dataset.
8
Compute out of B a new dataset C with just two features, e.g. two moments. Make a
scatterplot of C.
Classifiers
mapping
--> cmapm
mapping
-->
This just defines the mapping (trains it by a) for finding the first 2 principal components.
The fields of a mapping can be shown by struct(w). In the PRTools-manual or by
help mappings more information on mappings can be found. The mapping w may be
applied to a or to any other 10-dimensional dataset by:
>> b = map(a,w)
Gaussian Data, 100 by 2 dataset with 1 class: [100]
Instead of the routine map also the * operator may be used for applying mappings to datasets:
>> b = a*w
Gaussian Data, 100 by 2 dataset with 1 class: [100]
Note that the size of the variables a (100 10) and w (10 2) are such that the inner
dimensionalities cancel in the computation of b, like in all Matlab matrix operations.
The * operator may also be used for training. a*pca is equivalent with pca(a) and
a*pca([],2) is equivalent with pca(a,2). As a result, an untrained mapping can be stored
in a variable: w = pca([],2). They may, thereby, also be passed as an argument in a function
call. The advantages of this possibility will be shown later.
10
Fisher classifier
Quadratic classifier assuming normal densities
Quadratic classifier assuming normal uncorrelated densities
Linear classifier assuming normal densities with equal covariance matrices
Nearest mean classifier
Parzen density based classifier
k-nearest neighbour classifier
Decision tree
Support vector classifier
Neural network classifier trained by the Levenberg-Marquardt rule
4.1 Generate a dataset a by gendath and compute the Fisher classifier by w = fisherc(a).
Make a scatter plot of a and plot the classifier by plotc(w). Classify the training set by
d = map(a,w) or d = a*w. Show the result on the screen by +d.
4.2
What is displayed is the value of the sigmoid function of the distances to the
classifier. This function maps the distances to the classifier from the ( inf, + inf) interval on the (0,1) interval. The latter can be interpreted as posterior probabilities.
The original distances can be retrieved by +invsigm(d). This may be visualised by
plot(+invsigm(d(:,1)),+d(:,1),*), which shows the shape of the sigmoid function (distances along the horizontal axis, sigmoid values along the vertical axis).
4.3 During training distance based classifiers are appropriately scaled such that the posterior
probabilities are optimal for the training set in the maximum likelihood sense. In multi-class
problems a normalisation is needed to take care that the posterior probabilities sum to one.
This is enabled by classc. So classc(map(a,w)), or a*w*classc maps the dataset a on
the trained classifier w and normalises the resulting posterior probabilities. If we include
training as well then this can be written in a one-liner as p = a*(a*fisherc)*classc. (Try
to understand this expression: between the brackets the classifier is trained. The result
is applied on the same dataset). Note that because the sigmoid-based normalisation is a
monotonous transformation, it does not alter the class membership of data samples in the
maximum-aposteriori probability (MAP) sense.
This may be visualized by computing classifier distances, sigmoids and normalized posterior
probability estimates for a multi-class problem as follows. Load the 80x dataset by a = x80.
Compute the Fisher classifier by w = a*fisherc, classify the training set by d = a*w, and
compute p = d*classc. Display the various output values by +[d p]. Note that the object
confidences over the first 3 columns dont sum to one and that they are normalised in the last
3 columns to proper posterior probability estimates.
4.4
Density based classifiers like qdc find after training (w = qdc(a), or w = a*qdc),
density estimators for all classes in the training set. Estimates for objects in some dataset b
can be found by d = b*w. Again, posterior probability estimates are found after normalisation
11
by classc: p = d*classc. Have a look at +[d p] to see the estimates for the class density
and the related posterior probabilities.
Example 5. Classifiers and discriminant plots.
This example illustrates how to plot decision boundaries in 2D scatter plots by plotc.
5.1
>>
>>
>>
>>
>>
>>
Generate a dataset, make a scatter plot, train and plot some classifiers by
a = gendath([20 20]);
scatterd(a)
w1 = ldc(a);
w2 = nmc(a);
w3 = qdc(a);
plotc({w1,w2,w3})
Plot in a new scatter plot of a a series of classifiers computed by the k-NN rule (knnc) for
various values of k between 1 on 10. Look at the influence of the neighbourhood size on the
classification boundary. Check the boundary for k=1.
5.2
>>
>>
>>
>>
>>
>>
%
%
%
%
Plots like these are influenced by the grid size used for computing the classifier outputs in the
scatter plot. By default it is 30 30 (grid size is 30). The grid size value can be retrieved
and set by gridsize. Study its influence by setting the gridsize to 100 (or even larger) and
repeating the above commands. Use each time a new figure, so results can be compared. Note
the influence on the computation time.
Exercise 8. Normal densities based classifiers.
Take the features 2 and 3 of the Iris dataset. Make a scatter plot and plot in it the normal
densities, see also example 2 and/or exercise 6. Compute the quadratic classifier based on
normal densities (qdc) and plot it on top of this. Repeat this for the uncorrelated (udc)
and the linear classifiers (ldc) based on normal distributions, but plot them on top of the
corresponding density estimation plots.
Exercise 9. Linear classifiers (optional)
Use the same dataset for comparing some linear classifiers: the linear normal distribution
based classifier (ldc) , nearest mean (nmc), Fisher (fisherc) and the support vector classifier
(svc). Plot them on top of each other, in different colours, in the same scatter plot. Dont
plot density estimates now.
Exercise 10. Non-linear classifiers (optional)
Generate a dataset by gendath and compare in the scatter plots the quadratic normal densities
based classifier (qdc) with the Parzen classifier (parzenc) and the 1-nearest neighbour rule
(knnc([],1)). Try also a decision tree (treec).
12
14
prprogress on
a = flowers
b = a*im_resize(a,[64,64,3])
x = gendat(b,0.05);
show(x)
Note that just administration is stored untill real work has to be done by the show command.
After feature extraction and conversion to a dataset classifiers can be trained and tested:
>>
>>
>>
>>
>>
c = b*im_gray*im_moments([],hu)
[x,y] = gendat(c,0.05)
y = gendat(y,0.1)
w = dataset(x)*nmc
e = testc(dataset(y),w)
Also here the work starts with the dataset conversion. A number of classifiers and mappings
may operate directly (without conversion) on datasets, but this appear not be be full proof
yet. The classification result in this example is bad, as the features are bad. Look in the help
file of PRTools for other mappings and feature extractors for images. You may define your
own improcessing operations on datafiles by filtim.
Running Exercise 2. NIST Digit classification
Load a dataset of 50 NIST digits for each of the classes 3 and 5.
Compute 2 features.
Make a scatterplot.
Compute and plot some classifiers, e.g. nmc and ldc.
Classify the dataset.
Use the routine labcmp to find the erroneously classified objects.
Display these digits using the show command. Try to understand why the are incorrectly
classified given the features.
15
In PRTools three neural network classifiers are implemented based on an old version of
Matlabs Neural Network Toolbox:
bpxnc a feed-forward network (multi-layer perceptron), trained by a modified backpropagation algorithm with variable learning parameter.
lmnc a feed-forward network, trained by the Levenberg-Marquardt rule.
rbnc a radial basis network. This network has always one hidden layer which is extended
with more neurons as long as necessary.
These classifiers have built-in choices for target values, step sizes, momentum terms, etcetera.
No weight decay facilities are available. Stopping is done for no-improvement on the training
set, no improvement on a validation set error (if supplied) or at a given maximum number of
epochs.
In addition the following neural network classifiers are available:
rnnc feed-forward network (multi-layer perceptron) with a random input layer and a
trained output layer. This has a similar architecture as bpxnc and rbnc, but is much
faster.
perlc single layer perceptron with linear output and adjustable step sizes and target
values.
Example 10. The neural network as a classifier
The following lines demonstrate the use of the neural network as a classifier:
>> a = gendats; scatterd(a)
>> w = lmnc(a,3,1); h = plotc(w);
>> for i=1:50,
w = lmnc(a,3,1,w);delete(h);h=plotc(w);disp(a*w*testc); drawnow;
end
Repeat these lines if you expect a further improvement. Repeat the experiment for 5 and 10
hidden units. Try also the use of the back-propagation rule (bpxnc).
Exercise 14. A neural network classification experiment
Compare the performance of networks trained by the Levenberg-Marquardt rule (lmnc) with
different numbers of hidden units: 3, 5 and 10 for a three class digit problem (2, 3 and 5). Use
the NIST16 dataset (a = nist16) . Reduce the dimensionality of the feature space by pca to
a space that contains 90% of the original variance. Use training sets of 5, 10, 20, 50 and 100
objects per class and a large test set. Plot the errors on the training set and the test set as a
function of the training size. Which networks are overtrained? What can be changed in this
network to avoid overtraining?
Exercise 15. Overtraining (optional)
16
Study the errors on training and test set as a function of training time (number of epochs) for
a network with one hidden layer of 10 neurons. Use as classification problem gendatc with
25 training objects per class. Do this for lmnc as well as for bpxnc.
Exercise 16. Number of hidden units (optional)
Study the influence of the number of hidden units on the test error for the same problem and
the same classifiers as in the overtraining exercise 41.
Exercise 17. Network outputs and posterior probabilities (optional)
Network output values are normalised, like for all classifiers, by a*w*classc. Compare these
outcomes for test sets with the posterior probabilities found for the normal density based
classifier qdc and with the true posterior probabilities found for a qdc classifier based on a
very large training set. This comparison might be based on scatter plots. Use data based on
normal distributions. Train the network with various numbers of steps and try a small and a
large number of hidden units.
17
A simple example of the generation and use of a test set is the following:
11.1 Load the mfeat_kar dataset, consisting of 64 Karhunen-Loeve coefficients measured
for 10*200 written digits (0 to 9). A training set of 50 objects per class (i.e. a fraction of
0.25 of 200) can be generated by:
>> a = mfeat_kar
MFEAT KL Features, 2000 by 64 dataset with 10 classes: [200 ... 200]
>> [trainset,testset] = gendat(a,0.25)
MFEAT KL Features, 500 by 64 dataset with 10 classes: [50 ... 50]
MFEAT KL Features, 1500 by 64 dataset with 10 classes: [150 ... 150]
50 10 objects are stored in trainset, the remaining 1500 objects are stored in testset.
Train the linear normal densities based classifier and test it:
>> w = ldc(trainset);
>> testset*w*testc
Compare the result with training and testing by all data:
>> a*ldc(a)*testc
which is probably better for two reasons. Firstly, it uses more objects for training, so a better
classifier is obtained. Secondly, it uses the same objects for testing as well a for training, by
which the test result is positively biased. Because of that, the use of separate sets for training
and testing has to be preferred.
Example 12. Classifier performance
In this exercise we will investigate the difference in behaviour of the error on the training and
the test set. Generate a large test set and study the variations in the classification error based
on repeatedly generated training sets:
>> t= gendath([500 500]);
>> a = gendath([20 20]); t*ldc(a)*testc
Repeat this last line e.g. 30 times. What causes the variations in error?
18
19
true_lab = getlab(test_set);
w = fisherc(train_set);
est_lab = test_set*w*labeld;
confmat(true_lab,est_lab)
20
[a,b] = gendat(sonar,0.5)
w1 = ldc(a);
w2 = nmc(a);
w3 = parzenc(a);
w4 = svc(a);
e = roc(b,{w1 w2 w3 w4});
plotr(e)
This plot shows how the error shifts from one class to the other class for a changing threshold.
Try to understand what these plots indicate for the selection of a classifier.
21
22
23
The user may interactively change the dendrogram threshold and thereby study the related
grouping of examples.
Exercise 25. Differences in single- and complete- linkage clusterings
Compare the single- and complete-linkage dendrograms, constructed on the r15 dataset using
the squared Euclidean distance measure. Which method is suited better for this problem and
why? Compare the absolute values of thresholds in both situations - why can we observe an
order of magnitude difference?
Exercise 26. Maximum lifetime criterion (optional)
Each clustering solution in a dendrogram survives over a set of thresholds. The dendrogram
may be cut by selecting the most stable solution i.e. the clustering with the maximum lifetime.
For a given dendrogram, find the threshold corresponding to the maximum lifetime. Use
den_getthrs function to retrieve the list of all thresholds. Show the scatter plot of the
respective clustering (the labeling specific to the particular threshold may be obtained by the
den_getlab function).
Example 20. Clustering by the EM-Algorithm
A more general version of k-means clustering is supplied by emclust which can be used for
several classification algorithms instead of nmc and which returns a classifier that may be used
to label future datasets in the same way as the obtained clustering.
The following experiment investigates the clustering stability as a function of the sample size.
Take a dataset a and compute for a given choice of the number of clusters k the clustering of
the entire dataset (e.g. using ldc as a classifier) by:
>> [lab,v] = emclust(a,ldc([],1e-6,1e-6),k);
Here v is a mapping that by d = a*v classifies the dataset according to the final clustering
(lab = d*labeld). Note that for small datasets or large values of k some clusters might
become small use classsizes(d)) for the use of ldc. Instead nmc may be used.. The dataset
a can now be given the cluster labels lab by:
>> a = dataset(a,lab)
This dataset will be used for studying the clustering stability in the following experiments.
The clustering of a subset a1 of n samples per cluster of a:
>> a1 = gendat(a,repmat(n,1,k))
can now be found from
>> [lab1,v1] = emclust(a1,ldc([],1e-6,1e-6));
As the clustering is initialized by the labels of a1, the difference e in labeling between a and
the one defined by v1 can be measured by a*v1*testc, or in a single line:
>> [lab1,v1]=emclust(gendat(a,n),ldc([],1e-6,1e-6)); e=a*v1*testc
24
Average this over 10 experiments and repeat for various values of n. Plot e as a function of
n.
Exercise 27. Semi-supervised learning
We will study the usefulness of unlabelled data in wrapper approach
Various self-learning methods are implemented through emc. Investigate how the usefulness of
unlabelled data depends on training samples size and ratio of labelled vs. unlabelled data. Are
there significant performance differences between different choices of cluster model mappings
(e.g. nmc or parzenc)? Are there clear performance differences depending on whether the
data is indeed clustered or not (e.g. gendats vs. gendatb)?
Running Exercise 4. NIST Digit clustering
Load a dataset A of 25 NIST digits for all classes 0-9.
Compute the 7 Hu moments:
Perform a cluster analysis by kmeans with k = 10 neglecting the original labels.
Compare the cluster labels with the original labels using confmat.
25
Example 21.
21.1 Dissimilarity based (relational) representations Any feature based representation a (e.g.
a = gendath(100)) can be converted into a (dis)similarity representation d using the proxm
mapping:
>> w = proxm(b,par1,par2); % define some dissimilarity measure
>> d = a*w;
% apply to the data
in which the representation set b is a small set of objects. In d all (dis)similarities between the
objects in a and b are stored (depending on the parameters par1 and par2, see help proxm).
b can be a subset of a. The dataset d can be used similarly as a feature based set. A
dissimilarity based classifier using a representation set of 5 objects per class can be trained
for a training set as:
>>
>>
>>
>>
b
w
v
u
=
=
=
=
gendat(a,5);
proxm(b);
a*w*fisherc;
w*v;
%
%
%
%
This dissimilarity based classifier for the dataset a can also be computed by one-line:
>> u = a*(proxm(gendat(a,5))*fisherc);
It is like an ordinary classifier in the feature space of a, It can be tested, by a*u*testc.
21.2 Embedding of dissimilarity based representations A symmetric n n dissimilarity representation d (e.g. d = a*proxm(a,c) can be embedded into a pseudo-Euclidean space as
>> [v,sig,l] = goldfarbm(d);
v is the mapping, sig = [p q] is the signature of the pseudo-Euclidean space and l are the
corresponding eigenvalues (first p positive ones, then q negative ones). To check whether d is
Euclidean, you can investigate whether all eigenvalues l are nonnegative. They can be plotted
by:
>> plot(l,*)
The embedded configuration is found as:
>> x = d*v;
The 3D approximate (Euclidean) embedding can then be plotted by
>> scatterd(x,3);
To project to m most significant dimensions, use
>> [v,sig,l] = goldfarbm(d,m);
Exercise 28. Scatter plot with dissimilarity based classifiers
Generate a training set of 50 objects per class for the banana-set (gendatb). Make a scatter
26
plot of the training set and make the representation set visible as well. Compare the dissimilarity based classifier using Euclidean distances and a representation set of 5 objects per class
with the svc for a polynomial of degree 3 (svc([],p,3)). Repeat this for a dissimilarity
based classifier using 10 objects per class.
Example 22. Different dissimilarities
Sometimes objects are not given by features but directly by dissimilarities. Examples are the
distance matrices between 400 images of hand-written digits 3 and 8. They are based on
four different dissimilarity measures: Hausdorff, modified Hausdorff, blurred, Euclidean and
Hamming. Load a dataset d by load hamming38. It can be split in sets for training and
testing by
>> [dtr,dte,i] = gendat(d,10); dtr = dtr(:,i); dte = dte(:,i);
The dataset dtr is now a 20 20 dissimilarity matrix and dte is a 380 20 matrix based on
the same representation set. A simple trick to find the 1-NN error of dte based on the given
distances is
>> (1-dte)*testc
A classifier in the representation space can be trained on dtr and tested by dte as:
>> dte*fisherc(dtr)*testc
Exercise 29. Learning curves for dissimilarity representations
Consider four dissimilarity representations for 400 images of handwritten digits of 3 and
8: hamming38, blur38, haus38 and modhaus38. Which of the dissimilarity measures are
Euclidean and which not (goldfarbm)? Try to find out the most discriminative measure for
learning in dissimilarity spaces. For each distance dataset d, split it randomly into the train
and test dissimilarity data (see Example 21), select randomly a number of prototypes and
train a linear classifier (e.g. fisherc, ldc, loglc). Find the test error. Repeat it e.g. 20
times and average the classification error. Which dissimilarity data allows for reaching the
best classifier performance? Do the results depend much on a number of prototypes chosen?
Exercise 30. Dissimilarity application on spectra
Two datasets with spectral measurements from a plastic sorting application are provided:
spectra_big and spectra_small. spectra_big contains 16 classes and spectra_small two
classes. The spectra are sampled to 120 wavelengths (features). You may visualize spectral
measurements, stored in a dataset by using the plots command.
Three different dissimilarity measures are provided, specific to the spectra data:
dasam: Spectral Angle Mapper measures the angle between two spectra interpreted as points
in a vector space (robust to scaling)
dkolmogorov: Kolmogorov dissimilarity measures the maximum difference between the cumulative distributions (the spectra should be appropriately scaled to be interpreted as such)
dshape: Shape dissimilarity measures the sum of absolute differences (city block distance)
between the smoothed derivatives of the spectra (uses the Savitsky-Golay algorithm)
Compute a dissimilarity matrix d for the measures described. The nearest-neighbour error may
27
be estimated by using the leave-one-out procedure by the nne routine. In order to evaluate
other types of classifiers, a cross-validation procedure must be carried on. Note, that cleval
cannot be used for dissimilarity matrices! Use the crossvald routine instead.
Using the cross-validation approach (crossvald), estimate the performance of the nearest
neighbour classifier with one randomly selected prototype per class. To do that use the
minimum distance classifier mindistc. nne will not work here. Repeat the same for a larger
number of prototypes. Test also the full nearest neighbour (with as many prototypes as
possible) and a Fisher linear discriminant (fisherc), trained in a dissimilarity space. Find
out if fisherc outperforms the nearest neighbour rule and if so, how many prototypes suffice
to reach this point?
Running Exercise 5. NIST Digit dissimilarities
Load a dataset A of 200 NIST digits for the classes 1 and 8.
Select by gendat at random a dataset B of one sample per class.
Use hausdm to compute the standard and modified Hausdorff distances between A and B.
Study the scatterplots.
28
29
through all possible subsets of two features and that creates for each combination a new dataset
b. Use feateval to evaluate b using the Euclidean distance, the Mahalanobis distance and
the leave-one-out error for the one-nearest neighbour rule.
34.12 Find, for each of the three criteria, the two features that are selected by individual
ranking (use featseli), by forward selection (use featself) and by the above procedure
that finds the best combination of two features. Compute for each set of two features the
leave-one-out error for the one-nearest neighbour rule by testk.
Exercise 35. Feature Selection
Load the glass dataset. Rank the features by the sum of the Mahalanobis distances, using individual selection (featseli), forward selection (featself) and backward selection
(featselb). The selected features can be retrieved from the mapping w by:
>> w = featseli(a,maha-s);
>> getdata(w)
Compute for each feature ranking an error curve for the Fisher classifier by clevalf.
>> rand(seed,1); e = clevalf(a*w,fisherc,[],[],5)
The random seed is reset to make the results for different feature sequences w comparable.
The command a*w reorders the features in dataset a according to w. In clevalf, the classifier
is trained by a bootstrapped version of the given dataset. The remaining objects are used for
testing. This is repeated 5 times. All results are stored in a structure e that can be visualised
by plotr(e).
Plot the result for the three feature sequences obtained by the three selection methods in a
single figure by plotr. Compare this error plot with a plot of the maha-s criterion value
as a function of the feature size (use feateval).
Exercise 36. Feature scaling
Besides classifiers that are hampered by the amount of features, some classifiers are sensitive
to the scaling of the individual features. This can be studied by an experiment in which the
data is good and one in which the data is badly scaled.
In relation with sensitivity to badly scaled data, three types of classifiers can be distinguished:
1. classifiers that are scaling independent
2. classifiers that are scaling dependent, but that can compensate badly scaled data by
large training sets.
3. classifiers that are scaling dependent, that cannot compensate badly scaled data by large
training sets.
First, generate a training set of 400 points for two normally distributed classes with common
covariance matrix, as follows:
>> a = gauss(400,[0 0; 2 2],eye(2))
31
Prepare another dataset b by scaling down the second dimension of dataset a as follows:
>> x = +a; x(:,2) = x(:,2).*0.01; b = setdata(a,x);
Study the scatter plot of a and b (e.g. scatterd(a)) and note the difference when the scatter
plot of b is scaled properly (axis equal).
Which of the following classifiers belong to which type (1,2 or 3)?:
nearest mean (nmc),
1-nearest neighbour (knnc([],1)),
LESS (lessc([],1e6)), and
the Bayes classifier assuming normal distributions (qdc)?
(Note that for LESS, we set the C parameter high to stress satisfaction of the constraints for
correct training object classification). It may help if you plot the decision boundaries in the
scatter plots of a and b and play with the training set size.
Verify your answer by the following experiment:
Generate an independent test set c and compute the learning curves (i.e. an error curve
as function of the size of the training set) for each of the classifiers. Use training sizes of
5,10,20,50,100 and 200 objects per class. Plot the error curves.
Use scalem for scaling the features on their variance. For a fair result, this should be computed
on the training set b and applied to b as well as to the test set c:
>> w = scalem(b,variance); b = b*w; c = c*w;
Compute and plot the learning curves for the scaled data as well. Which classifier(s) are
independent of scaling? Which classifier(s) can compensate bad scaling by a large training
set?
Exercise 37. High dimensional data
In this exercise, you will experiment with datasets for which the number of features is substantially higher than the number of training objects. For this type of dataset, most traditional
classifiers are not suitable.
37.13 First, load the colon dataset and estimate the performance of the nearest mean
classifier, by cross-validation. Set the number of repetitions for the cross-validation function
higher (e.g. to 3) to get a more stable performance estimate.
The LESS classifier is a nearest mean classifier with feature scaling. It has an additional
parameter to balance data fit and model complexity.
37.14 Estimate the best C parameter setting for the LESS classifier using cross-validation on
the entire training set. The number of effectively used features can be inspected as follows:
>> w = lessc(a,C);
>> d = getdata(w);
>> d.nr
32
37.15 Now, estimate the generalisation performance of the LESS classifier with optimised
C parameter. Note that for an unbiased performance estimate, the C parameter should be
optimized in each sample of the crossvalidation separately. Use the functions nfoldsubsets,
nfoldselect, and testc to do the performance estimation through cross-validation. See how
cross-validation can be implemented with these functions in nfoldexample.m.
37.16 In this exercise, you will work again with the colon dataset. First reduce the number
of features to 50 as follows:
>>
>>
>>
>>
>>
labs = getnlab(a);
m1 = mean(+a(labs==1,:),1);
m2 = mean(+a(labs==2,:),1);
[dummy,ind] = sort(-abs(m1-m2));
a = a(:,ind(1:50));
33
34
of support vectors. Make a plot of the error and the number of support vectors as a function
of sigma. How well can the optimal sigma be predicted by the number of support vectors?
Exercise 42. Support Objects
Load a two class digit recognition problem by a = seldat(nist16,[1 2],[],[1:50]). Inspect it by the show command. Project it on a 2D feature space by PCA and study the
scatter plot. Find a support vector classifier using a quadratic polynomial kernel. Visualise
the classifier and the support objects in the scatter plot. Look also at the support objects
themselves by the show command. What happens with the number of support objects for
higher numbers of principal components?
Running Exercise 6. NIST Digit classifier complexity
Load a dataset A of 200 NIST digits for the classes 3 and 5.
Compute the Zernike moments:
Split the data in a training set of 25 objects per class and a test set.
Order the features on their individual performance.
Compute feature curves for the classifiers nmc, ldc and qdc.
35
One-Class Classifiers
Use help to get an idea what these routines do. Notice that all the classifiers have the same
structure: the first parameter is the dataset and the second parameter is the error on the
target class. The next parameters set the complexity of the classifier (if it can be influenced
by the user; for instance the k in the k-means data description) or influences the optimization
of the method (for instance, the maximum number of iterations in the Mixture of Gaussians).
Before these routines can be used on a data set, the class labels in the datasets should
be changed to target and (possibly) outlier. This can be done using the routines
target_class and oc_set. Outliers can, of course, only be specified if they are available.
Exercise 43. Fraction target reject
Take a two-class dataset (e.g. gendatb, gendath) and convert it to a one-class dataset using
target_class. Use the one-class classifiers given above to find a description of the data.
Make a scatterplot of the data and plot the classifiers. Firstly, experiment with different
values for the fraction of target data to be rejected. What is the influence of this parameter
on the shape of the decision boundary?
Secondly, vary the other parameters of the incsvdd, kmeans_dd, parzen_dd and mog_dd.
These parameters characterise the complexity of the classifiers. How does that influence the
decision boundary?
Exercise 44. ROC curve
Generate a new one-class dataset a using oc set (so that the dataset contains both target
and outlier objects), and split it in a train and test set. Train a classifier w on the training
set, and plot the decision boundary in the scatterplot.
Make a new figure, and plot the ROC curve there using:
>> h = plotroc(w,a);
There should be fat dot somewhere on the ROC curve. This is the current operating point.
By moving the mouse and clicking on another spot, the operating point of the classifier can
be changed. The updated classifier can be retrieved by w2=getrocw(h).
36
Change the operating point of the classifier, and plot the resulting classifier again in the
scatterplot. Do you expect to see this new position of the decision boundary?
Exercise 45. Handwritten digits dataset
Load the NIST16 dataset (a = nist16). Choose one of the digits as the target class and all
others as the outlier class using oc_set. Build a training set containing a fraction of the target
class and a test set containing both the remainder of the target class and the entire outlier
class. Compute the error of the first and second kind (dd_error) for some of the one-class
classifiers. Why do some classifiers crash, and why do other classifiers work?
Plot receiver-operator characteristic curves (dd_roc) for those classifiers in one plot. Which
of the classifiers performs best?
Compute for the classifiers the Area under the ROC curve (dd_auc). Does this error confirm
your own preference?
Example 26. Outlier robustness
In this example and the next exercise we will investigate the influence of the presence of an
outlier class on the decision boundary. In this example data is classified using support vector
data description (incsvdd).
Run the routine: sin_out(4,3) This routine creates target data from a sinusoid distribution,
places an outlier at (x,y) (here (x,y) = (4,3)) and calculates a data description.
Investigate the influence of the outlier on the shape of the decision boundary by changing its
position.
Exercise 46. Outlier robustness
Investigate the influence of an outlier class on a decision boundary for other one-class classifiers.
Convert a two-class dataset (e.g. gendath) to a one-class dataset by changing all labels to
target (e.g. using target_class(+a) or oc_set(+a)). Find a decision boundary for just
the target class.
Manually add outliers to your dataset. Compare the decision boundaries.
Exercise 47. Outliers in handwritten digits dataset
Load the Concordia dataset using the routine concor_data. Convert the entire data set to
a target class (this time the target class consists of all digits) and split it into a train and test
set.
Train a one-class classifier w on the train set. Check the performance of the classifier on the
test set z and visualise those digits classified as outliers:
>>
>>
>>
>>
zt = target_class(z);
labzt = zt*w*labeld;
[It,Io] = find_target(labzt);
show(zt(Io,:))
%
%
%
%
37
38
10
Classifier combining
maximum selection
minimum selection
median selection
mean combiner
product combiner
voting combiner
If the so-called base classifiers (w1, w2, . . .) do not produce posterior probabilities, but for
instance distances, then these combining rules operate similar. Some examples:
28.1
Generate a small dataset, e.g. a = gendatb; and train three classifiers, e.q.
w1 = nmc(a)*classc, w2 = fisherc(a)*classc, w3 = qdc(a)*classc. Create a combining classifier v = [w1, w2, w3]*meanc. Generate a testset b and compare the performances
of w1, w2, w3 individually with that of v. Inspect the architecture of the combined classifier
by parsc(v).
39
28.2
Load three of the mfeat datasets and generate training and test sets: e.g.
a = gendatb(50)
w1 = nmc(a)*classc, w2 = fisherc(a)*classc, w3 = qdc(a)*classc
a out = [a*w1 a*w2 a*w3]
v1 = [w1 w2 w3]*fisherc(a_out)
The routine baggingc can also be used to combine a set of classifiers based on bootstrapping.
using the posterior probability estimates. Combining rules like voting, min, max, mean, and
product can be used. Compare the performance of a simple classifier like nmc with its bagged
version for a datasets generated by gendatd. Study the scatter and classifier plots.
Running Exercise 7. NIST Digit classifier combining
Load a dataset A of 500 NIST digits for the classes 3 and 5.
Compute the Hu moments:
Split the data in a training set of 100 objects per class and a test set.
Generate at random 10 subdatasets of 25 objects per class from the training set and compute
the nmc for each of them.
Combine the 10 classifiers by various combing rules.
Compare the final classifiers with a nmc computed for the total training set by their performances on the test set.
41
11
Boosting
42
43
12
44
Note that the mean colours are very equal. Try to improve the result by using more clusters.
Exercise 59. Texture segmentation
A dataset a in the MAT file texturet contains a 256x256 image with 7 features
(bands): 6 were computed by some texture detector; the last one represents the original gray-level values. The data can be visualised by show(a,7). Segment the image by
[lab,w] = emclust(a,nmc,5). The resulting label vector lab may be reshaped into a label
image and visualised by imagesc(reshape(lab,a.objsize)). Alternatively, we may use the
trained mapping w, re-apply it to the original dataset a and obtain the labels by classim:
imagesc(classim(a*w)).
Investigate the use of alternative models (classifiers) in emclust such as the mixture of Gaussians (using qdc) or non-parametric approach by the nearest neighbour rule knnc([],1). How
do the segmentation results differ and why? The segmentation speed may be significantly increased if the clustering is performed only on a small subset of pixels.
Exercise 60. Improving spatial connectivity
The routine spatm concatenates for image feature datasets the feature space with the spatial
domain by performing a Parzen classifier in the spatial domain. The two results, feature space
classifier and spatial Parzen classifier may now be combined. Let us demonstrate the use of
spatm on a segmentation of a multi-band image emim31:
>> a = emim31;
>> trainset = gendat(a,500); % get a small subset
>> [lab,w] = emclust(trainset,nmc,3);
By applying the trained mapping w to the complete dataset a, we obtain a dataset with cluster
memberships:
>> b=a*w
16384 by 3 dataset with 1 class: [16384]
Let us now for each pixel decide on a cluster label and visualise the label image:
>> imagesc(classim(b));
This clustering was entirely based on per-pixel features and, therefore, neglects spatial connectivity. By using the spatm mapping, three additional features will be added to the dataset
b, each corresponding to one of three clusters:
>> c=spatm(b,2) % spatial mapping using smoothing sigma=2.0
16384 by 6 dataset with 1 class: [16384]
Let us visualise the resulting dataset c by show(c,3). The upper row renders three cluster
membership confidences estimated by the classifier w. The features in the lower row were
added by spatm mapping. Notice, that each of them is a spatially smoothed binary image
corresponding to one of the clusters. By applying a product combiner prodc, we obtain
45
an output dataset with three cluster memberships based on spectral-spatial relations. This
dataset defines a new set of labels:
>> out=c*prodc
16384 by 3 dataset with 1 class: [16384]
>> figure; imagesc(classim(out))
Investigate the use of other classifiers than nmc and the influence of different smoothing on
the segmentation result.
Exercise 61. Iterative spatial-spectral classifier (optional)
Previous exercise describes a single correction of spectral clustering by means of the spatial
mapping spatm. The process of combining the spatial-spectral may be iterated. The labels
obtained by combining the spatial and spectral domains may be used to train separate spectral and spatial classifiers again. Let us now implement a simple iterative segmentation and
visualise image labelings derived in each step:
>> trainset = gendat(a,500);
>> [lab,w]=emclust(trainset,nmc,3); % initial set of labels
>> for i=1:10, out=spatm(a*w,2)*prodc; imagesc(classim(out)); pause; ...
a=setlabels(a,out*labeld); w=nmc(a); end
Plot the number of label differences between iterations. How many iterations is needed to
stabilise the algorithm using different spectral models and spatial smoothing parameters?
46
13
a = rand(m,k).*(ones(m,1)*s) + ones(m,1)*u
a = randn(m,k)*(ones(m,1)*s) + ones(m,1)*u
uniform distribution
normal distribution with diagonal covariance matrix (s.*s)
lab = genlab(n,lablist)
a = dataset(a,lab)
a = gauss(m,u,G)
a = gencirc(m,s)
a = gendatc([ma,mb],k,ua)
a = gendatd([ma,mb],k,d1,d2)
a = gendath(ma,mb)
a = gendatm(m)
a = gendats([ma,mb],k,d)
a = gendatl([ma,mb],v)
a = gendatk(a,m,n,v)
a = gendatp(a,m,v,G)
[a,b] = gendat(a,m)
47
In the table below, a list of datasets is given that can be stored in the variable a provided
prdatasets is added to the path, e.g.:
a = iris;
>> a
Iris plants, 150 by 4 dataset with 3 classes: [50
50
50]
Generation
Generation
Generation
Generation
Generation
Generation
Generation
Generation
Generation
Generation
Generation
by
8 with 3 classes: [15 15 15]
by
6 with 2 classes: [229 169]
by
8 with 20 classes
by
5 with 2 classes: [127
67]
by
9 with 2 classes: [444 239]
by 30 with 24 classes: [500 each]
by
8 with 24 classes
by
3 with 2 classes: [50 50]
by
8 with 2 classes: [500 268]
by
7 with 3 classes: [143
77
52]
by
9 with 4 classes: [163 51]
by 13 with 2 classes: [160 137]
by
8 with 4 classes: [48 48 48 48]
by
4 with 3 classes: [50 50 50]
by 34 with 2 classes: [225 126]
by
6 with 2 classes: [145 200]
by 216 with 10 classes: [200 each]
by 76 with 10 classes: [200 each]
by 64 with 10 classes: [200 each]
by
6 with 10 classes: [200 each]
by 240 with 10 classes: [200 each]
by 47 with 10 classes: [200 each]
by 649 with 10 classes: [200 each]
48
nederland
12 by
ringnorm 7400 by
sonar
208 by
soybean1
266 by
soybean2
136 by
spirals
194 by
twonorm
7400 by
wine
178 by
12
20
60
35
35
2
20
13
Routines for loading multi-band image based datasets (objects are pixels, features are image
bands, e.g. colours)
emim31
lena
lena256
texturel
texturet
128
480
256
128
256
x
x
x
x
x
128
512
256
640
256
by
by
by
by
by
8
3
3
7 with 5 classes: [128 x 128 each]
7 with 5 classes:
Routines for loading pixel based datasets (objects are images, features are pixels)
kimia
nist16
faces
Some datafiles:
delft_idb
256
delft_images 619
mnist
2000
nist
28000
orl
400
roadsigns
332
highway
100
flowers
1360
9
10
20
40
2
17
49
Spherical Set
Highleyman Dataset
4
3
0
1
3
4
6
6
a = gendath([50,50]); scatterd(a);
a = gendatc([50,50]); scatterd(a);
Simple Problem
Difficult Dataset
8
2
0
2
1
6
2
a = gendatd([50,50],2);
scatterd(a); axis(equal);
10
a = gendats([50,50],2,4);
scatterd(a); axis(equal);
Banana Set
Spirals
4
4
2
2
4
6
6
6
10
a = spirals; scatterd(a);
10
a = gendatb([50,50]); scatterd(a);
50
a = faces([1:10:40],[1:5]);
show(a);
a = nist16(1:20:2000);
show(a);
10000
9000
8000
7000
6000
5000
4000
3000
5000
6000
7000
8000
9000
10000 11000
sepal length
a = faces(1:40,1:10);
w = pca(a,2);
scatterd(a*w);
80
80
80
80
70
70
70
70
60
60
60
60
50
50
50
petal length
sepal width
60
80
20
30
40
50
20
40
60
40
40
40
40
30
30
30
30
20
20
20
60
80
20
30
40
20
40
60
20
60
60
60
60
40
40
40
40
20
20
20
60
petal width
a = faces([1:40],[1:10]);
w = pca(a);
show(w(:,1:8));
80
20
30
40
40
60
20
20
20
10
10
10
10
80
20
30
40
sepal width
10
20
10
20
10
20
20
20
20
60
sepal length
20
40
60
petal length
10
20
petal width
a = iris;
scatterd(a,gridded);
a = texturet;
show([a getlab(a)],4);
51