Sie sind auf Seite 1von 32

Discriminant Analysis

What is Discriminant Analysis?


Discriminant Analysis is a technique for analyzing data when the dependent
variable is categorical in nature and the predictor or the Independent variable is
metric in nature

4 major objectives
Finding linear composites of predictor variables that enable the analyst to separate the
groups by maximizing among group variances relative to within group variance
Establish procedures for assigning new individuals, whose profiles but not group identities are
known, to one of the identified groups
Testing whether significant differences exist between mean predictor variable profiles of the
groups
Determining which variable accounts for most intergroup differences in mean profiles
Types of Discriminant Analysis?
When the dependent variable has 2 groups
Two Group Discriminant Analysis

When the dependent variable has more than 2 groups


Multiple Discriminant Analysis

The key is to develop a Discriminant Function


A linear combination of independent variables developed by discriminant analysis that will
best discriminate between the categories of the dependent variable
Comparison to ANOVA and Regression Analysis
Similarities ANOVA Regression Discriminant Analysis
Number of DV One One One
Number of IV Multiple Multiple Multiple

Dissimilarities ANOVA Regression Discriminant Analysis


Nature of DV Metric Metric Categorical
Number of IV Categorical Metric Metric
Understanding the Intuition
Group 1: Would Purchase X1 (Durability) X2 (Performance) X3 (Style)
Respondent 1 8 9 6
Respondent 2 6 7 5
Respondent 3 10 6 3
Respondent 4 9 4 4
Respondent 5 4 8 2
Group Mean 7.4 6.8 4.0
Group 2: Would Not Purchase
Respondent 6 5 4 7
Respondent 7 3 7 2
Respondent 8 4 5 5
Respondent 9 2 4 3
Respondent 10 2 2 2
Group Mean 3.2 4.4 3.8
Differences between means 4.2 2.4 0.2
Understanding the Intuition

Would Purchase

Would Not Purchase


The Model and the steps
= 0 + 1 1 + 2 2 + 3 3 + +
Where D : Discriminant Score
b : Discriminant Weight or Coefficient
X : Predictor or the Independent Variable

We need to estimate D such that the ratio of between group sum of squares
to within group sum of squares for the discriminant function is maximum
Steps of Discriminant Analysis
Formulate the problem
Estimate the Discriminant Function Coefficients
Determine the significance of the Discriminating Function
Interpret the results
Assess Validity of the Discriminant Analysis
Terminology: Cutting Score
The critical value of D also known as optimal cutting score is used as a
benchmark for determining in which group an object is classified
For equal group sizes
+
=
2
For unequal group sizes
+
=
+
Find which one is lower Da or Db
Calculate Di.
If Di < Dc : classify into group having lower cutting score
If Di > Dc : classify into group having higher cutting score
Terminology: Hit Ratio and Chance Criterion
Hit Ratio :
Is the ratio between correctly classified cases by total number of cases

Chance Criterion
= 2 + (1 )2
Where p proportion of individual in group I and (1-p) is the proportion of individual in group II

For multiple groups we define C as follows:

= 1

Used to test how accurate is the discriminating function.


Logic says it should classify more accurately than chance
Some researchers believe that classification accuracy should be 1.25 x C
Two Group Discriminant Analysis
All at once Approach
A Working Example : graduate.sav
Open the file : graduate.sav
The file gives details of 50 respondents who complete a particular PG course
Completed
Not Completed

We are going to run Discriminant Analysis with the following variables

Course Completed = f { Overall College GPA (gpa)


Major Area GPA (areagpa)
GRE Score on Area (grearea)
GRE Score on Quantitative (grequant)
GRE Score on Verbal (greverbal) }
Steps in SPSS (Analyze -> Classify -> Discriminant)
Click on the Statistics Button

SPSS is really
screwed up
when it
comes to
proper
analysis

You may want to


check these also

Click Continue when Done


Click on the Classify Button

Click Continue when Done


Click on the Save Button

Click Continue when Done

And Click OK When Done


Interpreting the Results : How Good it is?
Find the section titled Canonical Discriminant Functions

Eigenvalue Square of the Canonical Correlation tells us


associated with the that 57.91% of variation in the DV is
Discriminant Function explained by this discriminant function
Interpreting the Results : Is it Significant?
Find the section titled Canonical Discriminant Functions

SPSS uses Chi Square to test the H0 that


Canonical Correlations associated with functions are equal to 0 [No Discriminating Ability]
H0 should be rejected for a good discriminant function
As both functions are significant that means that both the functions are good discriminator

SPSS uses Wilks Lambda () to identify discriminating power


Closer the value is t zero (0) the better the discrimination
= 1 12
Interpreting the Results : Is it Significant?
Find the section titled Boxs Test of Equality of Covariance Matrix

H0 should not be rejected for a good discriminant function


Interpreting the Results : How Good it Actually is?
Scroll to find the Classification Results

This is the infamous


Confusion Matrix

Total correct
prediction is on the
main diagonal (22+22
= 44)

Percent Correctly
Predicted = 44/50 =
88%
How do I classify new cases?
Look at this table to get an idea of the discriminant function
D = 0.930 gpa 0.876 majorgpa 4.972 grearea + 0.432 grequant + 4.964 greverbal

Put in the values of new cases. If D value


is > 0 classify as Finish else if < 0 classify
as Not Finished
Two Group Discriminant Analysis
Stepwise Approach
What is Stepwise Approach?
Based on Procedures developed by PC Mahalonobis
Steps of Stepwise Approach:
Each predictor is entered sequentially based on their ability to discriminate between groups
An F ratio is calculated for each predictor by conducting a univariate ANOVA
The predictor with the largest F ratio is the first to be selected for inclusion
A second predictor is added based on highest adjusted F ratio
It goes on iteratively till maximum separation is achieved.
Multiple Group Discriminant Analysis
How to conduct?
Similar procedure
We will use discrim.sav to llustrate the procedure
Interpreting the Results : How Good it is?
Find the section titled Canonical Discriminant Functions

Eigenvalue Square of the Canonical Correlation tells us


associated with the that F1 explains 51.98% of variance in D and
Discriminant Function F2 explains 24.30% variance in D
Interpreting the Results : Is it Significant?
Find the section titled Canonical Discriminant Functions

SPSS uses Chi Square to test the H0 that


Canonical Correlations associated with functions are equal to 0 [No Discriminating Ability]
H0 should be rejected for a good discriminant function
As both functions are significant that means that both the functions are good discriminator

SPSS uses Wilks Lambda () to identify discriminating power


Closer the value is t zero (0) the better the discrimination
= 1 12 1 22
Interpreting the Results : Is it Significant?
Find the section titled Boxs Test of Equality of Covariance Matrix

H0 should not be rejected for a good discriminant function


Interpreting the Results : How Good it Actually is?
Scroll to find the Classification Results
Interpreting the Results : Structure Matrix
Provides some additional inputs

* Indicates that Social Rating and


Conservative Rating are associated
with F1 and Outdoor Interest with F2
Things to remember
Checklist of Discriminant Analysis
Tabachnick (1989) provides the following checklist:
Unequal Group Size and Missing Data
Pay particular attention to patterns of missing values.

Unequal group sizes are okay.


However, unequal group size can cause subtle changes during the classification phase

Multivariate Normality
Discriminant analysis does not make the strong normality assumptions

Outliers:
Can cause severe problems that discriminant analysis will not overcome

Linearity of Independent Variable


Thank You

Das könnte Ihnen auch gefallen