Sie sind auf Seite 1von 13

J2

This section contains few procedures in SPSS that can be used to detect outliers. Three example
of procedures available in SPSS are explained, i.e. (i) using the standardized score method; (ii)
using the Mahalanobis Distance method, and; (iii) Using the DfBeta Influence Statistics method.

Example:
Let us try to detect outliers in variable “Tangible” in the “spss workshop.sav” file.

1. SPSS Procedure – Using Standardized Score:

1.1 Choose Analyze Æ Descriptive Statistics Æ Descriptives …

1.2 The “Recode into Same Variables” dialog box will appear…

Step 3
Click OK..

Step 1
Step 2 Select the variable(s) and put
Click this it into the “Variable(s)” box
option.. at the right hand side by
clicking the arrow button:

© HishamMB 2008 1
Downloadable at http://www.hishammb.net/workshopfeb2008/outlierinspss.pdf
1.3 The result will appear in Output View.

You may save this output report for future reference, which you’re advised to do so.

1.4 Switch back to your data file, and you’ll see new variables in your data set…

© HishamMB 2008 2
Downloadable at http://www.hishammb.net/workshopfeb2008/outlierinspss.pdf
1.5 Now we can sort the standardized variable to detect the outlier(s) easily, which is in our
case ± 3.0 and beyond…

Bring the cursor on the header of the standardized variable Æ right click the mouse Æ
click on “Sort Ascending”…

1.6 Look at all the values of the standardized variable; none of them is beyond ± 3.0.
Therefore, no outlier is detected in this variable.

© HishamMB 2008 3
Downloadable at http://www.hishammb.net/workshopfeb2008/outlierinspss.pdf
2. SPSS Procedure – Using Mahalanobi’s Distance Statistics:

2.1 Choose Analyze Æ Regression Æ Linear …

2.2 The “Linear Regression” dialog box will appear…

Step 1
Select the
Dependent Step 3
Variable (DV) Click OK..
and put it into
the “Dependent”
box by clicking
the arrow button
beside the box:

Step 2
Select the all
Independent
Variables and put
it into the
“Independent(s)”
box by clicking the
arrow button Step 3
beside the box: Click
Save..

© HishamMB 2008 4
Downloadable at http://www.hishammb.net/workshopfeb2008/outlierinspss.pdf
2.3 The “Linear Regression: Save” dialog box will appear:

Step 3
Click Continue..

Step 1
Click
Mahalanobis.

2.4 The “Linear Regression” dialog box (as in step 2) will appear again…

Step 1
Click
Statistics..

© HishamMB 2008 5
Downloadable at http://www.hishammb.net/workshopfeb2008/outlierinspss.pdf
2.5 The “Linear Regression: Statistics” dialog box will appear:

Step 1
Click
Continue..

Step 1
Click
Descriptives..

2.6 The “Linear Regression” dialog box (as in step 2.2 and 2.4) will appear again…

Step 1
Click OK..

© HishamMB 2008 6
Downloadable at http://www.hishammb.net/workshopfeb2008/outlierinspss.pdf
2.7 The result will appear in Output View.

Remember that we’re now accessing the outlier cases, so this output is not relevant at
this moment. We’ll revisit this output when we do the regression analysis to draw
conclusion (in the inferential analysis section).

2.8 Switch back to your data file, and you’ll see new variables in your data set… But, to
detect the outlier(s), we need to look at the probability of the Mahalanobis (D2) and the
score themselves.

To get the probability of D2, choose Transform Æ Compute

© HishamMB 2008 7
Downloadable at http://www.hishammb.net/workshopfeb2008/outlierinspss.pdf
2.9 The “Compute Variable” dialog box will appear:

Step 1 Step 4
Assign a name for Input the Mahalanobis Distance Variable into the
Target Variable, for CDF.CHISQ function (in this case is MAH_1), put a coma
eg Prob_D2 and then 3. Substract the CDF.CHISQ from 1.

Step 2
Choose the
CDF &
Noncentral
CDF in the
“Function
Group” box.

Step 3
Choose the
Cdf.Chisq in the
“Function s and
Special
Variables” box
by double-
clicking the item
or by using this
arrow button:

Step 5
Click OK..

© HishamMB 2008 8
Downloadable at http://www.hishammb.net/workshopfeb2008/outlierinspss.pdf
2.10 Switch back to your data file, and you’ll see new variables in your data set… remember
that cases with probability of D2 < 0.001 should be considered as outlier, so the next
task is to find whether or not such cases exist.

Now we can sort the Prob_D2 variable to detect the outlier(s) easily…

Bring the cursor on the header of the variable (Prob_D2) Æ right click the mouse Æ click
on “Sort Ascending”…

Now, look at the values of the variable (Prob_D2) again; we have 9 outliers detected in our
dataset.

© HishamMB 2008 9
Downloadable at http://www.hishammb.net/workshopfeb2008/outlierinspss.pdf
3. SPSS Procedure – Using Influence Statistics:

Let say we want to detect outliers using the DfBeta and Standardized DfBeta…

3.1 Choose Analyze Æ Regression Æ Linear …

3.2 The “Linear Regression” dialog box will appear…

Step 1
Select the
Dependent Step 3
Variable (DV) Click OK..
and put it into
the “Dependent”
box by clicking
the arrow button
beside the box:

Step 2
Select the all
Independent
Variables and put
it into the
“Independent(s)”
box by clicking the
arrow button Step 3
beside the box: Click
Save..

© HishamMB 2008 10
Downloadable at http://www.hishammb.net/workshopfeb2008/outlierinspss.pdf
3.3 The “Linear Regression: Save” dialog box will appear:

Step 2
Click
Continue..

Step 1
Click DfBeta(s)
and Standardized
DfBeta(s).. …

3.4 The “Linear Regression” dialog box (as in step 3.2) will appear again…

Click
OK..

© HishamMB 2008 11
Downloadable at http://www.hishammb.net/workshopfeb2008/outlierinspss.pdf
3.5 The result will appear in Output View.

Remember that we’re now accessing the outlier cases, so this output is not relevant at
this moment. We’ll revisit this output when we do the regression analysis to draw
conclusion (in the inferential analysis section).

© HishamMB 2008 12
Downloadable at http://www.hishammb.net/workshopfeb2008/outlierinspss.pdf
3.6 Switch back to your data file, and you’ll see new variables in your data set… The
notation of the variables’ name is as follows:

DFB0_1 - “DfBeta for the intercept”

DFBa_1 - “DfBeta for the a-th independent variable” (so DFB1_1 means the DfBeta for
the 1st independent variable, which in our case is Tangible)”

To estimate whether or not the outliers (in the independent variables) influence the
parameter of the study, the following rules are used:

2
DfBeta > ; where n = Sample Size
n

As in our dataset, case which DfBeta > 0.093 shall be considered as outlier.

2
DfBeta > = 0.092549
467

As the result, this procedure has detected 37 cases as outliers... check it out!!

© HishamMB 2008 13
Downloadable at http://www.hishammb.net/workshopfeb2008/outlierinspss.pdf

Das könnte Ihnen auch gefallen