Beruflich Dokumente
Kultur Dokumente
stat
>
stata
>
giving a gift
dae
> zinb.htm
webuse fish
summarize count child persons camper
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------count |
250
3.296
11.63503
0
149
child |
250
.684
.8503153
0
3
persons |
250
2.528
1.11273
1
4
camper |
250
.588
.4931824
0
1
histogram count, discrete freq
OLS Regression - You could try to analyze these data using OLS regression. However, count data are highly non-normal and are not well estimated
by OLS regression.
Zero-inflated Poisson Regression - Zero-inflated Poisson regression does better when the data is not overdispersed, i.e. when variance is not much
larger than the mean.
Ordinary Count Models - Poisson or negative binomial models might be more appropriate if there are not excess zeros.
0:
1:
2:
3:
4:
5:
6:
7:
8:
9:
log
log
log
log
log
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
=
=
=
=
=
-519.33992
-471.96077
-465.38193
-464.39882
-463.92704
-463.79248
-463.75773
-463.7518
-463.75119
-463.75118
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
=
=
-463.75118
-440.43162
-434.96651
-433.49903
-432.89949
-432.89091
-432.89091
0:
1:
2:
3:
4:
5:
6:
log
log
log
log
log
log
log
(not concave)
Number of obs
Nonzero obs
Zero obs
=
=
=
250
108
142
LR chi2(2)
Prob > chi2
=
=
61.72
0.0000
-----------------------------------------------------------------------------count |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------count
|
child | -1.515255
.1955912
-7.75
0.000
-1.898606
-1.131903
1.camper |
.8790514
.2692731
3.26
0.001
.3512857
1.406817
_cons |
1.371048
.2561131
5.35
0.000
.8690758
1.873021
-------------+---------------------------------------------------------------inflate
|
persons | -1.666563
.6792833
-2.45
0.014
-2.997934
-.3351922
_cons |
1.603104
.8365065
1.92
0.055
-.036419
3.242626
-------------+---------------------------------------------------------------/lnalpha |
.9853533
.17595
5.60
0.000
.6404975
1.330209
-------------+---------------------------------------------------------------alpha |
2.678758
.4713275
1.897425
3.781834
-----------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 1197.43 Pr>=chibar2 = 0.0000
Vuong test of zinb vs. standard negative binomial: z =
1.70 Pr>z = 0.0444
The output has a few components which are explained below.
It begins with the iteration log giving the values of the log likelihoods starting with a model that has no predictors.
The last value in the log is the final value of the log likelihood for the full model and is repeated below.
Next comes the header information. On the right-hand side the number of observations used (316) is given along with the likelihood ratio chi-squared.
This compares the full model to a model without count predictors, giving a difference of two degrees of freedom.
This is followed by the p-value for the chi-square. The model, as a whole, is statistically significant.
Below the header, you will find the negative binomial regression coefficients for each of the variables along with standard errors, z-scores, p-values
and 95% confidence intervals for the coefficients.
Following these are logit coefficients for predicting excess zeros along with their standard errors, z-scores, p-values and confidence intervals.
Additionally, there will be an estimate of the natural log of the over dispersion coefficient, alpha, along with the untransformed value. If the alpha
coefficient is zero then the model is better estimated using an Poisson regression model.
Below the various coefficients you will find the results of the zip and vuong options.
The zip option tests the zero-inflated negative binomial model versus the zero-inflated poisson model. A significant likelihood ratio test for
alpha=0 indicates that the zinb model is preferred to the zip model.
The Vuong test compares the zero-inflated model negative binomial with an ordinary negative binomial regression model. A significant z-test
indicates that the zero-inflated model is preferred.
The predictors child and camper in the part of the negative binomial regression model predicting number of fish caught (count) are both significant
predictors.
The predictor person in the part of the logit model predicting excessive zeros is statistically significant.
For these data, the expected change in log(count) for a one-unit increase in child is -1.515255 holding other variables constant.
A camper (camper = 1) has an expected log(count) of 0.879051 higher than that of a non-camper (camper = 0) holding other variables constant.
The log odds of being an excessive zero would decrease by 1.67 for every additional person in the group. In other words, the more people in the
group the less likely that the zero would be due to not gone fishing. Put it plainly, the larger the group the person was in, the more likely that the
person went fishing.
We can see at the bottom of our model that the likelihood ratio test that alpha = 0 is significantly different from zero. This suggests that our data is
overdispersed and that a zero-inflated negative binomial model is more appropriate than a zero-inflated Poisson model. The Vuong test suggests
that the zero-inflated negative binomial model is a significant improvement over a standard negative binomial model.
Now, just to be on the safe side, let's rerun the zinb command with the robust option in order to obtain robust standard errors for the Poisson regression
coefficients. We cannot include the vuong option when using robust standard errors.
0:
1:
2:
3:
4:
5:
6:
7:
8:
9:
log
log
log
log
log
log
log
log
log
log
pseudolikelihood
pseudolikelihood
pseudolikelihood
pseudolikelihood
pseudolikelihood
pseudolikelihood
pseudolikelihood
pseudolikelihood
pseudolikelihood
pseudolikelihood
=
=
=
=
=
=
=
=
=
=
-519.33992
-471.96077
-465.38193
-464.39882
-463.92704
-463.79248
-463.75773
-463.7518
-463.75119
-463.75118
pseudolikelihood
pseudolikelihood
pseudolikelihood
pseudolikelihood
pseudolikelihood
pseudolikelihood
pseudolikelihood
=
=
=
=
=
=
=
-463.75118
-440.43162
-434.96651
-433.49903
-432.89949
-432.89091
-432.89091
0:
1:
2:
3:
4:
5:
6:
log
log
log
log
log
log
log
(not concave)
Number of obs
Nonzero obs
Zero obs
=
=
=
250
108
142
Inflation model
= logit
Log pseudolikelihood = -432.8909
Wald chi2(2)
Prob > chi2
=
=
40.16
0.0000
-----------------------------------------------------------------------------|
Robust
count |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------count
|
child | -1.515255
.2417504
-6.27
0.000
-1.989077
-1.041432
1.camper |
.8790514
.471303
1.87
0.062
-.0446855
1.802788
_cons |
1.371048
.3902521
3.51
0.000
.6061682
2.135928
-------------+---------------------------------------------------------------inflate
|
persons | -1.666563
.4314861
-3.86
0.000
-2.51226
-.8208658
_cons |
1.603104
.6665327
2.41
0.016
.2967236
2.909484
-------------+---------------------------------------------------------------/lnalpha |
.9853533
.2157394
4.57
0.000
.5625119
1.408195
-------------+---------------------------------------------------------------alpha |
2.678758
.5779135
1.755075
4.088567
-----------------------------------------------------------------------------Using the robust option has resulted in some change in the model chi-square, which is now a Wald chi-square. This statistic is based on log pseudolikelihoods instead of log-likelihoods. The model is still statistically significant. The robust standard errors attempt to adjust for heterogeneity in the model.
Now, let's try to understand the model better by using some of the post estimation commands. First off, we use the predict command with the pr option to
get the predicted probability of being "an excessive zero" due to not gone fishing. We then look the distribution of the predicted probability by the number of
persons in the group. We can see that the larger the group, the smaller the probability, meaning the more likely that the person went fishing.
predict p, pr
table persons, con(mean p)
---------------------persons |
mean(p)
----------+-----------
1 |
.4841405
2 |
.1505847
3 |
.0324023
4 |
.0062859
---------------------Finally, we will use the margins command to get the predicted number of fish caught, comparing campers with non-campers given different number of
children and maringsplot to visualize the information produced by the margins command.
margins camper,
at(child=(0(1)3))
Predictive margins
Model VCE
: Robust
Number of obs
Expression
1._at
: child
2._at
: child
3._at
: child
4._at
: child
250
-----------------------------------------------------------------------------|
Delta-method
|
Margin
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_at#camper |
1 0 |
3.302878
1.294607
2.55
0.011
.7654961
5.840261
1 1 |
7.955358
2.056003
3.87
0.000
3.925667
11.98505
2 0 |
.7258149
.3452292
2.10
0.036
.049178
1.402452
2 1 |
1.748208
.3534415
4.95
0.000
1.055475
2.44094
3 0 |
.1594994
.1028401
1.55
0.121
-.0420634
.3610623
3 1 |
.3841725
.1394934
2.75
0.006
.1107704
.6575747
4 0 |
.0350504
.0297846
1.18
0.239
-.0233263
.093427
4 1 |
.0844228
.0492046
1.72
0.086
-.0120164
.180862
-----------------------------------------------------------------------------marginsplot, noci scheme(s1mono) legend(position(1) ring(0))
Notice that by default the margins command fixed the expected predicted probability of being an excessive zero at its mean. For instance, here is an
alternative way for producing the same predicted count given camper = 0 /1 and child = 0.
sum pr
local mean_pr = r(mean)
margins camper, at(child=0) expression(exp(predict(xb))*(1-`mean_pr'))
Predictive margins
Model VCE
: Robust
Expression
at
Number of obs
250
: exp(predict(xb))*(1-.1615949432440102)
: child
=
0
-----------------------------------------------------------------------------|
Delta-method
|
Margin
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------camper |
0 |
3.302879
1.288955
2.56
0.010
.7765726
5.829184
1 |
7.955358
2.180409
3.65
0.000
3.681835
12.22888
------------------------------------------------------------------------------
Things to consider
Here are some issues that you may want to consider in the course of your research analysis.
Question about the over-dispersion parameter is in general a tricky one. A large over-dispersion parameter could be due to a miss-specified model or
could be due to a real process with over-dispersion. Adding an over-dispersion problem does not necessarily improve a miss-specified model.
The zinb model has two parts, a negative binomial count model and the logit model for predicting excess zeros, so you might want to review these
Data Analysis Example pages, Negative Binomial Regression and Logit Regression.
Since zinb has both a count model and a logit model, each of the two models should have good predictors. The two models do not necessarily need
to use the same predictors.
Problems of perfect prediction, separation or partial separation can occur in the logistic part of the zero-inflated model.
Count data often use exposure variable to indicate the number of times the event could have happened. You can incorporate exposure into your
model by using the exposure() option.
It is not recommended that zero-inflated negative binomial models be applied to small samples. What constitutes a small sample does not seem to be
clearly defined in the literature.
Pseudo-R-squared values differ from OLS R-squareds, please see FAQ: What are pseudo R-squareds? for a discussion on this issue.
See Also
Stata Online Manual
zinb
Related Stata Commands
zip -- zero-inflated poisson regression.
References
Cameron, A. Colin and Trivedi, P.K. (2009) Microeconometrics using stata. College Station, TX: Stata Press.
Long, J. Scott, & Freese, Jeremy (2006). Regression Models for Categorical Dependent Variables Using Stata (Second Edition). College Station, TX:
Stata Press.
Long, J. Scott (1997). Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications.
The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of
California.
ABOUT
2014 UC Regents
CONTACT
NEWS
GIS
Statistical Computing
Hoffman2 Cluster
Mapshare
Classes
Visualization
Conferences
3D Modeling
Reading Materials
UC Grid Portal
Technology Sandbox
IDRE Listserv
IDRE Resources
Data Centers
About IDRE
EVENTS
OUR EXPERTS