Sie sind auf Seite 1von 5

Logistic Regression: SPSS Output

This example uses SPSS 13 for the file "employee data.sav," which comes with SPSS. The analysis uses as a
dependent the attitude variable minority, which is coded 0= no, 1=yes. The independent variables are educ (years
education), prevexp (months experience), jobcat (1=clerical, 2=custodial, 3=managerial), and gender (m, f).
To obtain this output:
1. File, Open, point to Employee Data.sav.
2. Analyze, Regression, Binary Logistic.
3. In the Logistic Regression dialog box, enter minority as the dependent and the independents as educ,
prevexp, jobcat, and gender.
4. Click on the Categorical button and indicate jobcat as a categorical variables. (Gender will be automatically
treated as categorical since the raw data are text format = m or f). Click Continue.
5. Click on the Options button and select Classification Plots, Hosmer-Lemeshow Goodness-of-Fit, Casewise
Listing of Residuals (outsied 2 standard deviations), and check Display on Last Step. Click Continue.
6. Click on OK to run the procedure.

Comments in blue are by the instructor and are not part of SPSS output.
Logistic Regression

Case Processing Summary

Unweighted Cases(a) N Percent


Selected Cases Included in Analysis 474 100.0
Missing Cases 0 .0
Total 474 100.0
Unselected Cases 0 .0
Total 474 100.0
a If weight is in effect, see classification table for the total number of cases.
The case processing table above shows missing values is not an issue for these data.
Dependent Variable Encoding

Original Value Internal Value


No 0
Yes 1

The Dependent Variable Encoding table above shows the dependent variable, minority, is coded with the reference
category=1="yes", and the non-minority category is coded 0. This conventional for logistic analysis, which here
focuses on the probability that minority=1.

Above is SPSS's parameterization of the two categorical independent variables. Note that its parameter coefficients
for the last category of each such variable are all 0's, indicating the last category is the omitted value for that set of
dummy variables. The parameter codings are the X values for the dummy variables. They are are multiplied by the
logit (effect) coefficients as part of obtaining the predicted values of the dependent, much as one would compute an
OLS regression estimate.
Block 0: Beginning Block

The classification table above is a 2 x 2 table which tallies correct and incorrect estimates for the null model with
only the constant. The columns are the two predicted values of the dependent, while the rows are the two observed
(actual) values of the dependent. In a perfect model, all cases will be on the diagonal and the overall percent correct
will be 100%. If the logistic model has homoscedasticity (not a logistic regression assumption), the percent correct
will be approximately the same for both rows. Here it is not, with the model predicting non-minority cases but not
predicting any minority cases. While the overall percent correctly predicted seems moderately good at 78.1%, the
researcher must note that blindly estimating the most frequent category (non-minority) for all cases would yield the
same percent correct (78.1%).

Above SPSS prints the initial test for the model in which the coefficients for all the independent variables are 0. The
finding of significance above indicates this null model should be rejected.

Block 1: Method = Enter

The chi-square goodness-of-fit test tests the null hypothesis that the step is justified. Here the step is from the
constant-only model to the all-independents model. When as here the step was to add a variable or variables, the
inclusion is justified if the significance of the step is less than 0.05. Had the step been to drop variables from the
equation, then the exclusion would have been justified if the significance of the change was large (ex., over 0.10).

The Cox-Snell R2 and Nagelkerke R2 are attempts to provide a logistic analogy to R 2 in OLS regression. The
Nagelkerke measure adapts the Cox-Snell measure so that it varies from 0 to 1, as does R2 in OLS.

The Hosmer and Lemeshow Goodness-of-Fit Test divides subjects into deciles based on predicted probabilities, then
computes a chi-square from observed and expected frequencies. The p-value=0.051 here is computed from the chi-
square distribution with 8 degrees of freedom and indicates that the logistic model is a (barely) good fit. That is, if
the Hosmer and Lemeshow Goodness-of-Fit test statistic is .05 or less, we reject the null hypothesis that there is no
difference between the observed and predicted values of the dependent; if it is greater, as we want, we fail to reject
the null hypothesis that there is no difference, implying that the model's estimates fit the data at an acceptable level.
As here, this does not mean that the model explains much of the variance in the dependent, only that it does so to a
significant degree.
The classification table above is a 2 x 2 table which tallies correct and incorrect estimates for the full model with the
independents as well as the constant. The columns are the two predicted values of the dependent, while the rows are
the two observed (actual) values of the dependent. In a perfect model, all cases will be on the diagonal and the
overall percent correct will be 100%. If the logistic model has homoscedasticity (not a logistic regression
assumption), the percent correct will be approximately the same for both rows. Here it is not, with the model
predicting all but seven non-minority cases but predicting only one minority cases. While the overall percent
correctly predicted seems moderately good at 76.8%, the researcher must note that blindly estimating the most
frequent category (non-minority) for all cases would yield an even higher percent correct (78.1%), as noted above.
This implies minority status cannot be differentiated on the basis of education, job experience, job category, and
gender for these data.

The Wald statistic above and the corresponding significance level test the significance of each of the covariate and
dummy independents in the model. The ratio of the logistic coefficient B to its standard error S.E., squared, equals
the Wald statistic. If the Wald statistic is significant (i.e., less than 0.05) then the parameter is significant in the
model. Of the independents, jobcat and gender are significant but educ and prevexp are not.
The "Exp(b)" column is SPSS's label for the odds ratio of the row independent with the dependent (minority).It is
the predicted change in odds for a unit increase in the corresponding independent variable. Odds ratios less than 1
correspond to decreases and odds ratios more than 1.0 correspond to increases in odds. Odds ratios close to 1.0
indicate that unit changes in that independent variable do not affect the dependent variable.
Step number: 1

Observed Groups and Predicted Probabilities

160  
 
 
F  
R 120  
E  
Q  Y 
U  Y 
E 80  N 
N  YN 
C  NNY 
Y  NNY 
40  N NNN 
 N NNNNY 
 NN NNNNNY 
 NNNN NNNNNNNNNNNY Y 
Predicted 
Prob: 0 .25 .5 .75 1
Group: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY

Predicted Probability is of Membership for Yes


The Cut Value is .50
Symbols: N - No
Y - Yes
Each Symbol Represents 10 Cases.

The classplot above is an alternative way of assessing correct and incorrect predictions under logistic regression.
The X axis is the predicted probability from 0.0 to 1.0 of the dependent being classified "1" (minority status). The Y
axis is frequency: the number of cases classified. Inside the plot are columns of observed 1's and 0's, which it here
codes as Y's (for minority status) and N's (not minority), with 10 cases per symbol. Examining this plot will tell such
things as how well the model classifies difficult cases (ones near p = .5). In this case, it also shows nearly all cases
are coded as being in the N (not minority status) group, even if in reality they are in the Y (minority) group.

Das könnte Ihnen auch gefallen