Sie sind auf Seite 1von 7


Saturday, January 14, 2017 17:29

Let's suppose we have a binary classification problem and our data is

We suppose that the data with respect to its class is normally distributed. This is a
very nave assumption without looking at the data.

We model each class with a normal distribution.

The parameters and can be obtained by ML estimation.

The log-likelihood function is formulated:

Setting its derivative to zero, we get the following:

Module_A Page 1
Now, we have got two Gaussian models, each of them representing the class
conditional likelihood.

The class-conditional likelihood functions can be used in calculation of the

probability of a class with respect to via the Bayes' theorem:

where is a normalization factor, called evidence.

The Bayes' formula in words can be expressed:

Module_A Page 2
In our two class classification problem, if we are given an sample the probability
of the class label can be formulated as

GDA and logistic regression

Now, let's suppose, in our model, the two Gaussian distributions share a diagonal
covariance matrix .
Writing out the distributions we get:

Now the probability of class if we see sample can be written as:

Module_A Page 3

The logistic regression model:

Note that, using two multivariate Gaussian distributions with shared diagonal
covariance matrix we get an equivalent model to logistic regression in binary
classification problems. The converse is not true, being logistic
regression does not imply is multivariate Gaussian. Using Gaussian
distribution to model implies a strong assumption of that the class
conditional data is indeed Gaussian. If this assumption is correct, then GDA is
better than logistic regression. On the contrary, logistic regression does not
require our data to be in any parametric form. It uses less assumptions, and hence
it is more robust, i.e. it works better with a wide-range of types of data. It turns
out, if the class-conditional data is modelled by a Poisson distribution we get a
logistic regression model as well.

Gaussian Discriminant Analysis (cont.)

Now let's suppose we have two univariate class-conditional probability density
functions as shown in figure below.

Module_A Page 4
Then, using class priors, say, and the
corresponding likelihood functions can be plotted:

The classification of a new sample can be done:

Pick if otherwise:

The error can be given as:

The following figure shows the likelihood ratio.

Module_A Page 5
The value of the likelihood ratio can range between zero and infinity. Decision can
be made as well on the likelihood ratio as:

Pick if otherwise:

This leads to some decision boundaries. The penalty of misclassifying

samples as and vice versa can be incorporated to the decision threshold .

Multiclass classification using discriminant functions

Classes can be represented by so called discriminant functions for classification
purposes. Let be the discriminant function for a class . Then the
classifier can chose class if

In case of modelling classes by some analytical distribution the discriminative

functions s can be defined as

Since the denominator would be the same in all discriminant functions it can be

Or for simpler and faster calculation its logarithm can be used

Module_A Page 6
Or for simpler and faster calculation its logarithm can be used

This could be further simplified:

For multi-class classification using SVMs, for each class one can define a
discriminative functions such that the class is marked as a
positive class, while all the other classes are merged and marked as negative
class. Then the classification can be carried out as above. In fact, this approach
can be applied on all binary classification methods, such as logistic regression.

The discriminant function of logistic regression for multi-class classification

problems can be formulated as well:

This can be further simplified:

Module_A Page 7