Beruflich Dokumente
Kultur Dokumente
229
© 1998 Kluwer Academic Publishers. Printed in the Netherlands.
Abstract. There has been considerable debate on how important goodness of fit is as a tool in
regression analysis, especially with regard to the controversy on R 2 in linear regression. This article
reviews some of the arguments of this debate and its relationship to other goodness of fit measures.
It attempts to clarify the distinction between goodness of fit measures and other model evaluation
tools as well as the distinction between model test statistics and descriptive measures used to make
decisions on the agreement between models and data. It also argues that the utility of goodness of fit
measures depends on whether the analysis focuses on explaining the outcome (model orientation) or
explaining the effect(s) of some regressor(s) on the outcome (factor orientation).
In some situations a decisive goodness of fit test statistic exists and is a central tool in the analysis.
In other situations, where the goodness of fit measure is not a test statistic but a descripitive measure,
it can be used as a heuristic device along with other evidence whenever appropriate. The availability
of goodness of fit test statistics depends on whether the variability in the observations is restricted, as
in table analysis, or whether it is unrestricted, as in OLS and logistic regression on individual data.
Hence, G2 is a decisive tool for measuring goodness of fit, whereas R 2 and SEE are heuristic tools.
1. Introduction
In sciences where causal model building holds a central position there is often
a need to evaluate theoretical models empirically. In other contexts, such as in
measurement models, the quality of data needs to be evaluated against a theoretical
model. In both situations, this is done by analysing the agreement between the data
generated by the model (predicted data, fitted or expected values) and the collected
empirical data (observed data/values). The degree of agreement is a measure of the
goodness of fit between data and model.
The usefulness of evaluating the goodness of fit of models is not undisputed.
Opinions also differ as to how best to do this, in other words what specific statistical
measure should be used for the purpose. An intense discussion has been taking
place in recent years about measures of goodness of fit for structural equation
models with latent variables (Bollen and Long, 1993). Another example is the
debate about the usefulness of R 2 in linear regression analysis which took place
a few years ago among methodologists in the USA (see Achen, 1990; King, 1990;
Lewis-Beck and Skalaban, 1990).
230 CURT HAGQUIST AND MAGNUS STENBECK
In view of the fact that regression analysis is one of the most popular and widely
used methods for analysing survey data it is important that attention be paid to
goodness of fit not only in connection with ordinary/linear regression analysis but
also in relation to logistic regression analysis.
The purpose of this article is to highlight and discuss some key questions
concerning goodness of fit in both linear and logistic regression analysis. In the
article the term logistic regression will be used as a label for logit analysis based
on individual as well as table data.
The following topics are beyond the purpose of the article:
Far from all regression analyses are model oriented. There is often no intention
to explain, nor to predict the outcome. King (1986), for example, maintains that
the purpose of regression analysis is simply to measure the effects of independent
variables on a dependent variable.
The approach in this situation is to focus on a subset of effects including possi-
ble confounders. Such a factor oriented approach does not attempt to explain all of
the variation, but rather seek answers to the questions “how does x affect y?” and
“how is the effect of x on y modified by z?”. It is well known that such an analysis
only requires the simultaneous inclusion of all effects on the outcome which are
interrelated (the factor orientation).
In factor oriented linear or logistic regression on individual data goodness of fit
is not applicable. The fit of the model must be evaluated only against the specific
part of the variation which is relevant to the subset of effects of interest. An overall
test of the model fit is too general for this purpose and does not answer the right
question. However, another factor oriented approach is to make use of categorical
data in a contingency table. As the irrelevant variation hereby is removed from the
analysis an overall test of the model fit will answer the right question. Therefore, in
contingency table analysis the goodness of fit tests almost always apply, regardless
of the approach of the analysis.
In the literature there are several more or less similar measures that are labeled or
described as goodness of fit statistics. In the following discussion of measures we
distinguish between absolute goodness of fit measures and relative fit measures.
We also make a distinction between test statistics and descriptive measures.
The distinctive feature of goodness of fit measures is that they assess the agree-
ment between the model predictions and the observed data. In contrast, relative fit
measures evaluate the difference between two restricted models in some unit of
deviation which is determined by the observations. The goodness of fit measures
always have as their point of departure the saturated model, i.e., a completely
unrestricted model which assigns one parameter to each observation or cell. The
relative fit measures have an abitrarily or theoretically chosen baseline model as its
point of departure, e.g., the model of independence or some other model putting
constraints on the permissible range of the observations.
The distinctive feature of a test statistic is that it has a known probability distri-
bution under some specified condition. Therefore such a measure computed from
a sample can be evaluated in order to assess the probability that the statistic comes
from that probability distribution. This makes it possible to judge whether it is
likely that the specified condition is at hand. In practice, when the residual unex-
plained variation is judged random one can say that the model cannot be rejected
and hence "fits the data". When using a test statistic, it can be determined whether
232 CURT HAGQUIST AND MAGNUS STENBECK
it is likely that the unexplained variation comes from a random distribution and a
statistical judgement with respect to the goodness of fit of the model can be made.
The typology in Figure 1 identifies four possible combinations of the above
criteria:
1. Goodness of fit test statistics which are decisive tools for measuring goodness
of fit.
2. Descriptive goodness of fit measures which describe the fit in an absolute sense
but which cannot be used to make tests of model fit.
3. Relative fit test statistics which are used to make tests of pairs of models but
do not measure the goodness of fit.
4. Descriptive relative fit measures which do not meet any of the two criteria on
a useful tool.
Table I lists some common measures of fit in linear and logistic regression along
with labels and some procedures associated with them in a common software pack-
age, SPSS. In the following, we will use the above classification as our point of
departure for a discussion of the merits and drawbacks of the measures.
the model contains two or more independent variables, R 2 is usually called the
“multiple coefficient of determination” and is then a measure of what proportion
of the total variation in the dependent variable is explained by the entire model, in
other words all the independent variables.
The formula for R 2 is just as easy to grasp as the substantial meaning of the
measure. The formula is written
P
(Ŷ − Ȳ )2
R =P
2
(Y − Ȳ )2
It seems to be more the rule than the exception that books on quantitative meth-
ods describe and refer to R 2 as a measure of goodness of fit (see, for instance,
Schroeder, Sjoquist and Stephan, 1986; Berry and Feldman, 1985; Lewis-Beck,
1980). The rationale for this is that R 2 is a summary measure of the agreement
between the observed and the predicted data. Nevertheless, R 2 has by some been
seen as ‘out of fashion’ in political science (Lewis-Beck and Skalaban, 1990). The
utility of R 2 as a measure of goodness of fit is highly controversial among method-
ologists. Whereas, for example, Lewis-Beck and Skalaban (1990), see R 2 “. . . as
an invaluable tool in quantitative political science . . . ” (p. 168) and unique for pre-
dicting a dependent variable, Christopher H. Achen (1990) asserts that explained
variance does not really mean anything and that “. . . R 2 becomes a meaningless
accident of the sample . . . ” (p. 183). For many social scientists, interpretations in
terms of explained variance have “. . . doubtful meaning but great rhetorical value”
(Achen, 1982: 59).
Since R 2 evaluates the agreement between the model and the observed data it is a
goodness of fit measure. But although the measure always has the same 0–1 range,
there is no way of knowing how much variance must be explained in order for the
fit to be good enough. R 2 does not have a known distribution when the residual
unexplained variation is random. Hence, it is not possible to test whether all the
systematic variation has been accounted for.Therefore, although R 2 evaluates the
goodness of fit it is not a decisive goodness of fit test statistic.
Achen (1990) regards R 2 as unusable “. . . for drawing inferences about causal
strength or substantively meaningful goodness of fit (p. 180). In Achen’s words, R 2
is “. . . a purely descriptive quantity with little substantive content” (Achen, 1990:
173).
What the critics of R 2 see as its Achilles heel is seen by its supporters as its
strength. In a defence of R 2 , Lewis-Beck and Skalaban (1990) took as their starting
point the need for a statistical measure to read the predictive capability of a model,
in other words, to be a measure of how well the dependent variable can be predicted
from knowledge of the independent variables. According to them, this calls for a
measure which not only measures absolute predictive capability but also relative
predictive capability. As they see it, R 2 has precisely the properties that absolute
prediction measures lack:
It is clear that the problems surrounding R 2 are only part of a broader complex of
problems relating to the use of standardised measures. In principle, the difference
between an absolute and a relative measure is the same as the difference between
unstandardised and standardised measures. The difference is that the spread of the
independent variables is taken into account in one case but not in the other.
236 CURT HAGQUIST AND MAGNUS STENBECK
4.6. R2 MAXIMISATION
Warnings have been given about regarding the maximisation of R 2 as the aim of
regression analysis (Schroeder et al., 1986). Usually R 2 increases in value for
each independent variable that is added to a model; in any event it does not fall
(Schroeder et al., 1986; Berry and Feldman, 1985). This happens regardless of
whether the variable is relevant in the model or not. This property of R 2 has been
regarded as undesirable. However, R 2 shares this property with all other goodness
of fit measures.
The problem is that no good criterion exists on how high R 2 ought to be before
the fit is good or acceptable. An attempt to resolve the problem of making such a
decision is to ‘factor in’ the actual number of independent variables when calculat-
ing R 2 . The adjusted R 2 obtained in this way can decrease when another variable
are added to the model (Schroeder et al., 1986). But the adjusted R2 can no more
than the unadjusted be given an objective interpretation pertaining to the overall fit
of the model.
It is important not to be carried away by the chase for high R 2 values. With
maximisation of R 2 as a goal it is easy to "lose" independent variables which
are structurally relevant but which do not contribute to raising R 2 very much.
This is especially prudent when ‘mindless’ procedures such as stepwise regression
are used, i.e. when the choice of variables in the model is guided by a computer
program rather than a theoretical idea.
GOODNESS OF FIT IN REGRESSION ANALYSIS 237
4.7. SEE
Alongside R 2 , the textbooks also mention “SEE” or “se” for the evaluation of
goodness of fit for linear regression models. SEE is usually defined as “the standard
error of estimate of Y ” (Lewis-Beck, 1980; Berry and Feldman, 1985), as “the
standard error of the regression” (Achen, 1982: 62) or as “the estimated standard
deviation of the actual Y from the predicted Y ”. (Lewis-Beck, 1980: 37). SEE
is an expression of the estimated standard deviation of the “disturbances” (Achen,
1982). SEE is a measure of how good the fit of a model is, expressed as an “average
prediction error” (Lewis-Beck, 1980). The intuitive or substantial interpretation of
SEE is that it expresses “. . . how far the average dependent variable departs from its
forecasted value” (Achen, 1982: 62). Numerically the interpretation is also simple:
the goodness of fit of a regression with a lower SEE is better than the goodness of
fit of a regression with a higher SEE. The formula for calculating SEE is
s
P
(Yi − Ŷi )2
Se =
n−2
(Lewis-Beck, 1980: 37). The lower limit of SEE is zero; it has no fixed upper limit.
Unlike R 2 , SEE is neither standardised nor sample specific. SEE is an estimate
of the average agreement between the predictions and the true population values of
Y . In other words, SEE is not affected by accidental differences in the variances of
the independent variables. It has the same unit of measurement as the dependent
variable. This permits comparisons of models across different samples.
The main objection to SEE is that it is hard to interpret as an independent
measure, since a single SEE in itself does not indicate whether the fit is good
or bad. Lewis-Beck and Skalaban (1990) are of the opinion that “. . . SEE is not
self-sufficient as a measure of relative predictive capability” (p. 157), unlike R 2 ,
which has exactly the properties that SEE lacks (Lewis-Beck and Skalaban, 1990),
i.e., the common scale across realizations. The differences between R 2 and SEE
as measures of goodness of fit is illustrated by the fact that different measures
can lead to different results. A model with a lower SEE and thus a better fit than
another model in terms of SEE, may have a lower R 2 and thus a worse fit. This
inconsistency between the different measures when evaluating goodness of fit is
due to the fact that the variations in the independent variables are greater within
the sample with higher R 2 .
4.8. F- TESTS
It is possible to ‘convert’ R 2 into a test statistic. The ratio of the average explained
to the average residual variance is a statistic which has an F distribution if the
explanatory variables taken together do not contribute to an increase of the fit of a
baseline model by more than what would be expected by chance.
238 CURT HAGQUIST AND MAGNUS STENBECK
The reference data for the F -test is not the observed data but a summary
measure of them. To quote a statement from Hanushek and Jackson (1977) in
connection to R 2 “. . . it is simply a comparison of the estimated systematic model
with a very naïve model, namely the mean of the observed values of Y t” (p. 58,
italics added).
Hence, the mean of Y can be regarded as a model, namely the model of in-
dependence between all the joint X’es and the outcome Y . In fact, the F -test
can be used to test any pair of nested models. The choice of baseline model is
indeed the problem with this approach. The model of independence is no more
natural than any other baseline. As Hanuschek and Jackson (1977) say, it is often
‘naïve’ to expect Y to be independent of all the X’es. The choice of comparison
becomes arbitrary or at least dependent on the situation. In this sense it differs from
a goodness of fit measure which has its natural baseline in the observed data. Albeit
a model test, the F -test is therefore not a goodness of fit test in the absolute sense.
The likelihood ratio statistics play an important role in logistic regression analysis.
The expression may lead to confusion, since it it used for two different applications,
one of which is a goodness of fit test statistic and another which is not. The like-
lihood ratio goodness of fit test applies to contingency table analysis. It is referred
to by some as G2 (see, for instance, Bishop et al., 1975; Demaris, 1992; Fienberg,
1980; Gilbert, 1993) and by others as L2 (see, for instance, Clogg and Shihadeh,
1994; Knoke and Burke, 1980). In the terminology used in GLM a further expres-
sion occurs, “the deviance” (McCullagh and Nelder, 1989; Agresti, 1996). Another
application of the likelihood ratio statistics is a measure closely related to the F -test
in linear regression. It seems that this measure is usually called −2 log (L0/L1),
but it is sometimes referred to as the model chi-squared (Demaris, 1992; SPSS
1993).
The two measures use the same quantity, a measure of deviation. The maximum
likelihood method chooses parameter estimates on a predefined set of parameters
such that the likelihood of observing the sample data is maximized. This maximum
likelihood is used for assessing the model fit.
GOODNESS OF FIT IN REGRESSION ANALYSIS 239
5.2. G2
The G2 test relates the loglikelihood value of a specific model to the loglikeli-
hood value for a completely unrestricted model, the saturated model. The saturated
model fits the observed data exactly. Hence, G2 measures the agreement between
the observations and the data generated by an unsaturated model. The formula for
G2 is
XX
G2 = 2 nij log(nij /m̂ij )
where nij are the observed frequencies and m̂ij are the expected frequencies in
the ith row and the j th column (Agresti, 1990: 48).This statistic is chi square
distributed if the residual variation is random and if the expected number of obser-
vations in each cell of the contingency table is sufficiently large. Hence, under quite
general conditions it is possible to evaluate whether it is probable that the obtained
statistic comes from a chi squared distribution. If this hypothesis cannot be rejected,
the conclusion is that the overall model fit is acceptable. The distributions of the
chi squared statistics are not dependent on the sample but only on the number
of degrees of freedom of the model. Therefore, an objective evaluation of model
fit is possible. The point value of the statistic is partly due to random variability
depending on the specific sample at hand. Therefore, it is not possible to directly
compare chi squared statistics obtained from different samples. Nevertheless, G2
is a goodness of fit test statistic and therefore it plays a central role in the analysis
of table data.
the sample size in each cell and for each parameter of the saturated model. Hence,
with a sufficient sample size, tests of model fit using the saturated model as the
baseline becomes possible.
In logistic regression with one or more continuous independent variables the
saturated model cannot fulfill the sample size requirements as the observed mul-
tivariate sample space increases along with the number of observations. In other
words, the size of the saturated model is not fixed but increases with n (Hosmer and
Lemeshow, 1989). The minimum number of observations in order for large sample
properties to apply can only be achieved if the model is restricted. A non-saturated
model is necessary. This rules out the saturated model as the baseline model. In
contrast, the intercept-only (L0) model has one parameter regardless of the sample
size, such that increasing the number of observations increases the statistical power
of the estimated intercept. Therefore, with a sufficient total number of observations
the comparison between L0 and L1 is a valid test with a known distribution, but
the comparison between LS and L1 is not.
A restricted baseline model must be used in order to meet the distributional
requirements. But any restriction on the baseline model rules out the observations
as the reference model. Hence, it is not possible to define a strict goodness of fit
test statistic for logistic regression with continuous independent variables.
To circumvent the difficulties with n-asymptotic distributions, grouping of data
has been suggested as a possible approach (see Hosmer and Lemeshow, 1989;
Agresti, 1996).
Hosmer and Lemeshow discuss two different approaches, both based on esti-
mated probabilities. They advocate the strategy to group data in a table defined by
percentiles of the estimated probabilities. Simulations indicate that the test-statistic
approximates a chi-squared distribution when the model is correctly specified
(Hosmer and Lemeshow, 1989). However, this involves collapsing the continuous
variable such that the model differs from the original specification.
Likelihood ratio tests can be used for other comparisons than with the saturated
model – as already indicated, other nested models can be compared with each other.
The difference between the G2 values for different models is chi2 -distributed when
all the added parameters for the added variables are equal to zero (Fienberg, 1980).
If the difference between the G2 values of the two models is non-significant, the
more parsimonious model can be chosen, without committing Type II errors.
In GLM parlance, the deviances of two models (deviance0 and deviance1) are
compared (Agresti, 1996). This comparison of two values can also be done using
values of −2 log(L0/L1), in which case it is equivalent to the partial F -test of
linear regression analysis. Like the F -test, −2 log(L0/L1) makes it possible to
test for the effect of individual parameters which are added to nested models, even
when one of the independent variables is continuous. Although it is a model test,
GOODNESS OF FIT IN REGRESSION ANALYSIS 241
the −2 log L(L0/L1) does not test the goodness of fit, i.e. it does not compare
the model with the observed data. Just like the F -test it is a test statistic but not
a goodness of fit measure. However, it can be used in the same way as an F -test
by comparing the the likelihood values obtained from two hierachically related
models. The difference is evaluated against the chi square distribution. The F and
−2 log (L0/L1) statistics are de facto almost identical; in fact the former can
“. . . be derived from the likelihood ratio principle” (Aldrich and Nelson, 1984: 89).
Both the F and the −2 log (L0/L1) tests are used to judge whether the the joint
parameters of the model have any effect on the dependent variable. In other words
the −2 log(L0/L1) measures “. . . whether any of the predictors are linearly related
to the log odds of the event of interest” (Demaris 1992: 47). What is tested is the
hypothesis that all parameters except the intercept are equal to zero. The model
statistic can be evaluated against a chi-squared distribution (Demaris, 1992). This
test is sometimes described as a goodness of fit test (Aldrich and Nelson, 1984).
The different values of −2 log(L0/L1) across two logistic models can be eval-
uated with partial tests. This is analogous to partial tests of differences between F
statistics in linear regression.
5.5. R2 “ANALOGS ”
The −2 log L is not a standardised goodness of fit measure. There have been some
attempts to rescale it. For instance, McFadden (1974) suggested
model failure according to R 2 . This opinion has been sharply criticized by Duncan
(personal communication, 1985) who regards it as an expression of “statisticism”.
The R 2 -analog is neither a model parameter, nor a test statistic, nor a measure of
effect, nor a goodness of fit measure. Hence, it has no clear substantive meaning. In
contrast, the G2 is independent of the sample size and its power to detect systematic
deviation between model and data increases as the sample size grows large.
The conclusion is that in a strict sense goodness of fit can be tested only in
table analysis. This possibility has been called “. . . one of the attractions of cate-
gorical data . . . ” (Clogg and Shihadeh, 1994: 8). For linear regression analysis of
continuous data there are only heuristic tools for the evaluation of goodness of fit.
For logistic regression with continuous independent variables the situation may be
even worse.
For comparisons of fit between different nested models there are decisive tools
and valid tests in more situations, notably for linear regression and logistic regres-
sion of table as well as individual data. It is important to point out that this is only
GOODNESS OF FIT IN REGRESSION ANALYSIS 243
true for the situation in which one of the models fits the data. The latter can only be
tested in the table analysis. In other contexts it is a matter of theoretical arguments.
However, even if logistic table analysis, in comparison with linear regression
analysis, offers better scope for testing and comparing models, G2 does not solve
the problem of wrongly-specified models. As a measure of goodness of fit, G2 is
excellent provided only that the table on which the analysis is based is correct. The
decision to analyze a specific contingency table rather than another is not objective
but theory-driven. The strength of the theory underlying the table construction here
becomes decisive in the same way as the decision on model specification in the case
of individual analysis. Both decisions are critical and both are untestable.
Sometimes collapsing of continuous regressors occurs in order to make the
tools of table analysis available. We find this practice highly unadvisable unless
supported by strong theoretical motivations.
The lack of goodness of fit tests in individual level regression is not a coinci-
dence. It reflects a basic contradiction between the requirements of goodness of fit
measures and those of statistical tests. A goodness of fit measure is supposed to
compare the observations with predictions derived from a model. The goal is to
assess whether the predictions are sufficiently close to the original data. According
to our opinion, the saturated model is therefore the only possible baseline model
for a goodness of fit measure.
A test statistic, on the other hand, is supposed to distinguish between random
and systematic variability. Applied to goodness of fit, the test is used to deter-
mine whether there is some systematic variability left in the data that the tested
model failed to account for. In order for the test to meet the necessary statistical
assumptions the number of observations across the multivariate sample space must
be large enough for the baseline model. But in individual level regression there is
only one observation per combination of x-values. Therefore, the goodness of fit
measure does not have a known distribution under randomness. To achieve large
sample properties for the test one must use an unsaturated model as the baseline
model (such as in hierarchical F -test and the −2 log(L0/L1) test), or one must
group the data into classes (e.g., percentiles as in the test proposed by Hosmer and
Lemeshow (1989). But the resulting tests are not goodness of fit tests in the strict
sense of the word. In other words, both criteria cannot be fulfilled simultaneously.
If one requirement is met, the other must be sacrificed.
Other conclusions with respect to fit measures used in linear and logistic
regression are
• R 2 , F , −2 Log(L0/L1) and G2 have scales which are independent of the scale
of Y , whereas SEE and −2 Log L does not.
• R 2 , F , −2 Log L, −2 Log(L0/L1) and G2 are sample specific measures,
whereas SEE is an estimated property of the estimates.
Sample specific and standardised measures like R 2 must always be interpreted
in the light of the variance differences that may occur. Hence, their values cannot
244 CURT HAGQUIST AND MAGNUS STENBECK
References
Achen, C. H. (1982). Interpreting and Using Regression. Newbury Park: Sage Publications.
Achen, C. H. (1990). What Does “Explained Variance” Explain?: Reply, Political Analysis, 2. Ann
Arbor: The University of Michigan Press, pp. 173–184.
Agresti, A. (1996). An Introduction to Categorical Data Analysis. New York: John Wiley and Sons.
Agresti, A. (1990). Categorical Data Analysis. New York: John Wiley and Sons.
Aldrich, J. H. & Nelson, F. D. (1984). Linear Probability, Logit, and Probit Models. Newbury Park:
Sage Publications.
Berry, W. D. & Feldman, S. (1985). Multiple Regression in Practice. Newbury Park: Sage
Publications.
Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (1975). Discrete Multivariate Analysis. Theory
and Practice. Cambridge: The MIT Press.
Bollen, K. A. & Long, J. S. (1993). Introduction. In: K. A. Bollen & J. S. Long (eds), Testing
Structural Equation Models. Newbury Park: Sage Publications.
Clogg, C. C. & Shihadeh, E. S. (1994). Statistical Models for Ordinal Variables. Thousand Oaks:
Sage Publications.
Demaris, A. (1992). Logit Modeling. Practical Applications. Newbury Park: Sage Publications.
Duncan, O. D. (1985). Personal letter to David Burke.
Fienberg, S. E. (1980). The Analysis of Cross-Classified Categorical Data. Cambridge: The MIT
Press.
GOODNESS OF FIT IN REGRESSION ANALYSIS 245
Gilbert, N. (1993). Analyzing Tabular Data. Loglinear and Logistic Models for Social Researchers.
London: UCL Press.
Hagle, T. M. & Mitchell, G. E. (1992). Goodness of fit measures for probit and logit. American
Journal of Political Science 36: 762–784.
Hanushek, E. A. & Jackson, J. E. (1977). Statistical Methods for Social Scientists. Orlando:
Academic Press.
Hosmer, D. W. & Lemeshow, S. (1989). Applied Logistic Regression. New York: John Wiley and
Sons.
King, G. (1986). How not to lie with statistics: Avoiding common mistakes in quantitative political
science. American Journal of Political Science 30: 666–687.
King, G. (1990). Stochastic Variation: A Comment on Lewis-Beck and Skalaban’s “The R-Squared”.
Political Analysis, 2. Ann Arbor: The University of Michigan Press, pp. 185–200.
Knoke, D. & Burke, P. J. (1980). Log-linear models. Newbury Park: Sage Publications.
Lewis-Beck, M. S. (1980). Applied Regression. An Introduction. Newbury Park: Sage Publications.
Lewis-Beck, M. S. & Skalaban, A. (1990). The R-Squared: Some Straight Talk. Political Analysis,
2. Ann Arbor: The University of Michigan Press, pp. 153–171.
McCullagh, P. & Nelder, J. A. (1989). Generalized Linear Models. London: Chapman and Hall.
McFadden, D. (1974). Conditional Logit Analysis of Qualitative Choice Behavior. Frontiers of
Econometrics. New York: Academic Press, pp. 105–142.
Menard, S. (1995). Applied Logistic Regression Analysis. Thousand Oaks: Sage Publications.
Schroeder, L. D., Sjoquist, D. L., & Stephan, P. E. (1986). Understanding Regression Analysis. An
Introductory Guide. Newbury Park: Sage Publications.
SPSS (1993). SPSS for Windows. Advanced Statistics Release 6.0. Chicago: SPSS.
SPSS (1993). SPSS for Windows. Base System User’s Guide. Release 6.0. Chicago: SPSS.
SPSS (1994). SPSS 6.1 for Windows update. Chicago: SPSS.