Sie sind auf Seite 1von 17

Prof.

JM Oakes

Draft: Sunday, March 11, 2007

Analyzing Group Randomized Trial (GRT) data in Stata (v9.2)


Key texts such as David Murrays discuss analysis of GRT data with SASs Proc Mixed
and related commands (eg, GLIMMIX). At the time of Murrays text, SAS was the only
available widely used program able to fit such mixed-models (sometimes called variance
component or multilevel models). In fact, much of Proc Mixed was written with David
Murray and Peter Hannans thoughts on GRT analysis in mind.
Today every major and many minor stats package fits mixed models. Stata is a good
choice for analyzing GRT data for many reasons, including ease of use, cost, superior
graphics, and much more.
Like SAS Proc Mixed, Stata has the ability to fit mixed model regressions. In fact, Stata
has two broad commands: xtmixed and a series of commands such as xtreg, xtlogit,
xtpoisson, etc. The latter commands effectively fit only 2-level models but that is all
typical GRT designs require.
It is easy to fit GRT appropriate models in Stata. Effect estimates and standard errors
are correct and (virtually) equivalent to those found in Proc Mixed. What Stata doesnt
enable is proper significance tests of effect estimates. This is because Stata does
not permit sufficient control over degrees-of-freedom (df) for either t-statistics or Fstatistics. Stata presumes large-sample inference and uses Z-statistic tests; this
assumption is untenable in the typical GRT. But this shortcoming can be overcome with
a little work and some simple programming.
Lets begin by considering posttest only data from a (ficticious) group randomized trial of
kids nested in schools.
. use postonly_x.dta", clear

Our interest here is in math scores from a standardized achievement test. The treatment
variable, experimental = 1 or control = 0, is named condition, or cond for short.
Fit a mixed model using xtmixed. We are regressing the outcome variable, math, on
cond (treatment or control) with a random effect for school, since subjects are nested in
schools. The var option tells Stata to give us variances instead of standard deviations,
which is the default.
. xtmixed math cond || school: , var

The results look like this

Performing EM optimization:
Performing gradient-based optimization:
Iteration 0:
Iteration 1:

log restricted-likelihood = -1638.4222


log restricted-likelihood = -1638.4222

Computing standard errors:


Mixed-effects REML regression
Group variable: school

Number of obs
Number of groups

=
=

311
20

Obs per group: min =


avg =
max =

9
15.6
25

Wald chi2(1)
Prob > chi2

Log restricted-likelihood = -1638.4222

=
=

4.50
0.0339

-----------------------------------------------------------------------------math |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------cond |
19.17173
9.040171
2.12
0.034
1.453321
36.89014
_cons |
509.6286
6.381632
79.86
0.000
497.1209
522.1364
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------school: Identity
|
var(_cons) |
262.7821
135.4762
95.66709
721.8204
-----------------------------+-----------------------------------------------var(Residual) |
2150.299
178.1555
1827.997
2529.428
-----------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) =
14.19 Prob >= chibar2 = 0.0001

What do we make of this? First, the model converged quickly in two iterations. There are
311 subjects nested in 20 schools. The condition effect is 19.17. This is the estimated
effect of the intervention on subjects. Its the delta, . The standard error of this effect is
9.04.
Importantly, this SE is correct since it accounts for the nesting in schools. The ratio of
the 19.17 to 9.04 is 2.12, which Stata labels it as a Z. This label and corresponding pvalue (0.034) would be okay if the standard-error were calculated on many degrees of
freedom. Such practice reflects the idea of aymptotic inference, which may be viewed as
infinite subjects or df. However, in practice upwards of 30-40 groups per arm would
probably suffice (we say more about required number of groups and/or subjects below).
It follows that what is not correct for these data is the associated p-value, 0.034 and
confidence intervals. Why, because, again, Stata assumes infinite degrees of freedom;
Statas z test is valid asymptotically. But we cannot make this assumption since we have
only 20 schools. And as you know, df for a treatment effect in a GRT is 2(number of
groups per condition 1), which in this case is 18, which is a tad less than infinity!
How can we get Stata to give us the correct p-value? Simple. We have to tell Stata to
evaluate our effect estimate and corresponding standard error with a t-distribution and
the correct df. This is easy with Statas built in t-distribution function, ttail.
. display 2 * (ttail(18, (_coef[cond] / _se[cond])))

Basically, the ttail function takes this form ttail(df, t-ratio) . We type display to
have Stata display the value of this function with appropriate inputs. We have to
manually calculate the df, but every GRT student knows how to do this. Then we simply
pick-off the values of the treatment effect coefficient, defined internally in Stata as
_coef[cond] as well as corresponding standard error which is _se[cond] . Multiplying the
result by a factor of two gives us a two-tail p-value instead of the default one-tail. This is
always appropriate in GRTs.
The result is
.04809365

In other words, p = 0.048, a value some would say is statistically significant. Note,
though, that this value is greater than the incorrect but default output p-value of 0.034,
which is based on asymptotic theory.
Parenthetically, you could also type
. display 2 * (ttail(18, 2.12)))

But there would be rounding error. Its better to use the high-precision system variables.
Note that if our treatment variable was not named cond but, say, Tx, we would type
. display 2 * (ttail(18, (_coef[Tx] / _se[Tx])))

Finally, after estimating model you could also type


. parmest, li(, noobs sepby(parm)) dof(18) format(estimate stderr %9.3f dof %9.0f t p
min* max* %9.3f)

Which yields t-test with appropriate df (according to you!); the nice thing here is the 95%
CIs are also corrected. The downside some extraneous information comes with it (I set it
apart with horizontal line).
+---------------------------------------------------------------------------------+
|
eq
parm
estimate
stderr
dof
t
p
min95
max95 |
|---------------------------------------------------------------------------------|
|
math
cond
19.172
9.040
18
2.121
0.048
0.179
38.164 |
|---------------------------------------------------------------------------------|
|
math
_cons
509.629
6.382
18
79.859
0.000
496.221
523.036 |
| lns1_1_1
_cons
2.786
0.258
18
10.807
0.000
2.244
3.327 |
| lnsig_e
_cons
3.837
0.041
18
92.616
0.000
3.750
3.924 |
+---------------------------------------------------------------------------------+

Moving on
What is the intraclass correlation coefficient (ICC) and variance component ratio (VCR)
of this analysis?
The Stata output shows the school-level variance to be 262.7821 and the individual or
residual variance to be 2150.299. Ignore the standard errors and associated 95%CI for
these values.
It follows that the ICC and VCR are ratios of these values. We again use system
variables and the display command to get what we want.
To see the components, along with all other model coefficients, tell Stata to display the
results from the most recently estimated model. The results are stored in a behind-thescenes matrix, named e(b). To display the matrix, use the matrix list command.
. matrix list e(b)

e(b)[1,4]

y1

math:
math: lns1_1_1:
lnsig_e:
cond
_cons
_cons
_cons
19.17173 509.62865 2.7856627 3.8366811

It is important to appreciate that Stata stores the variance components (last two
columns above) in the natural log and standard deviation form. Obviously we need to
do a little manipulation, exponentiating and squaring the stored results to get the
variances on the scale we want.
To get the group-level variance we exponentiate
itself.

_coef[lns1_1_1:]

.di exp(_coef[lns1_1_1:])* exp(_coef[lns1_1_1:])


262.78215

To get residual variance, do same for _coef[lnsig_e:]


. di

exp(_coef[lnsig_e:])* exp(_coef[lnsig_e:])

2150.2993

and then multiply it by

This is ugly text, no? To get ICC and VCR well have to take ratios of these, which will be
even more ugly!. So, lets define nicely named macro-variables to hold the quantities.
. local var_e =

exp(_coef[lnsig_e:])* exp(_coef[lnsig_e:])

. local var_g = exp(_coef[lns1_1_1:])* exp(_coef[lns1_1_1:])

Now, we can easily write code for ICC and VCR

The VCR is
. di `var_g / `var_e

careful of left and regular apostrophes here!

.12220725

And the ICC is


. di `var_g' / (`var_e' + `var_g')
.108899

What about covariance/regression adjustment? After all, it could be that


background/demographic differences in subjects nested within schools confound the
estimated treatment effect.
We might mitigate the effect of such confounders by fitting a more complex model; in this
case we include poverty level (0=no, 1=impoverished) , gender (0=female, 1=male), and
reading achievement score, read, in the model.
. xtmixed math cond pov gender read || school: , var

The corresponding output is this


Performing EM optimization:
Performing gradient-based optimization:
Iteration 0:
Iteration 1:
Iteration 2:

log restricted-likelihood = -1576.1167


log restricted-likelihood = -1576.1053
log restricted-likelihood = -1576.1053

Computing standard errors:


Mixed-effects REML regression
Group variable: school

Number of obs
Number of groups

=
=

311
20

Obs per group: min =


avg =
max =

9
15.6
25

Wald chi2(4)
Prob > chi2

Log restricted-likelihood = -1576.1053

=
=

155.74
0.0000

-----------------------------------------------------------------------------math |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------cond |
5.771752
5.449119
1.06
0.290
-4.908324
16.45183
pov | -12.14481
7.861378
-1.54
0.122
-27.55283
3.263208
gender |
1.842925
4.593764
0.40
0.688
-7.160687
10.84654
read |
7.197725
.6182529
11.64
0.000
5.985972
8.409479
_cons |
-268.623
67.2485
-3.99
0.000
-400.4277
-136.8184
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------school: Identity
|
var(_cons) |
38.59627
46.5025
3.638982
409.3651
-----------------------------+-----------------------------------------------var(Residual) |
1571.626
130.5526
1335.492
1849.513
-----------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) =
1.07 Prob >= chibar2 = 0.1507

What do me make of this output? Pretty much the same conclusions as the simpler,
unadjusted, model above. The first thing to note is that the sample size has not changed;
there remain 311 subjects nested in 20 schools. This means there are no missing values
for any of the variables included in the model, which is good.
The estimated treatment effect is now 5.77 with a standard error of 5.45, yielding a ratio
of 1.06.
The proper p-value is again calculated on 18 df, and evaluated with the t-distribution.
. display 2 * (ttail(18, (_coef[cond] / _se[cond])))
.30351236

Not surprisingly, the treatment effect is not statistically significant from zero.
It is appropriate to treat the effects of poverty, gender, and reading score as nuisance
variables since they would not be needed if randomization better balanced such
background characteristics. Ignore such effects.
The variance components may again be captured by exploiting the models matrix of
estimated coefficeints.
. mat list e(b)
e(b)[1,7]

y1

math:
cond
5.7717525

math:
pov
-12.14481

math:
gender
1.8429245

math:
math:
read
_cons
7.1977253 -268.62305

We can calculate the VCR and ICC just as before.


. local var_e =

exp(_coef[lnsig_e:])* exp(_coef[lnsig_e:])

. local var_g = exp(_coef[lns1_1_1:])* exp(_coef[lns1_1_1:])

. di `var_g' / `var_e'
.02455818

. di `var_g' / (`var_e' + `var_g')


.02396953

lns1_1_1:
_cons
1.8265779

lnsig_e:
_cons
3.679933

How about adjusted means? These are useful since giving concrete values of adjusted
values of the outcome measure frequently helps readers/reviewers get a sense of the
treatment effect on the scale of the outcome measure itself.
In SAS we use LSMEANS, but in Stata we use adjust. The basic syntax is
. adjust variables in model to adjust for , by(cond) other options

More concretely,
. adjust pov gender read , by(cond) se format(%9.3f)

Which yields this,

----------------------------------------------------------------------------------------Dependent variable: math


Equation: math
Command: xtmixed
Covariates set to mean: pov = .09967846, gender = .57877814, read = 109.1254
-------------------------------------------------------------------------------------------------------------------------cond |
xb
stdp
----------+----------------------0 |
516.688
(3.806)
1 |
522.459
(3.816)
---------------------------------Key: xb
= Linear Prediction
stdp = Standard Error

The top line of the top box indicates that the dependent variable is math score, along
with other information novices may ignore. The second line of the top box shows values
used in the prediction equation, which by default are equal to the means of the
covariates. Recall that the mean of a (0,1) variable is the proportion yes or true for
the measure. Thus, by default Statas adjust command yields what SAS requires by OM
(for observed margins) option.
The bottom box of the output shows the model adjusted means for the treatment and
control condition. The treatment condition mean is 522.46 and the control is 516.69.
Notice that the difference is 5.77, which is the estimated treatment effect.
Its easy to aslo requst 95% confidence intervals, but we discourage doing so unless you
are confident asymptotic assumptions are defensible; in other words, the number of
groups in your study exceeds 30 per arm.

What about the values of the random effects, sometimes called Empirical Bayes (EB)
estimates or BLUPS; in class, we call these bumps.
. predict re, reffects

Now list them, but only one per group (ie, school). To do this generate a variable to
condition on such that only one value per school is displayed.
. egen single = tag(school)
. list cond school re if single, noobs
+---------------------------+
| cond
school
re |
|---------------------------|
|
0
2
4.464902 |
|
1
3
2.849438 |
|
0
12
1.03236 |
|
1
19
1.836211 |
|
0
23
4.210702 |
|---------------------------|
|
1
24
-7.145421 |
|
0
25
-1.866368 |
|
1
27
-1.431592 |
|
0
31
-6.018359 |
|
1
32
-1.188469 |
|---------------------------|
|
0
35
-.8096194 |
|
1
41
-3.062465 |
|
0
43
-.3606333 |
|
1
44
1.590811 |
|
0
45
-3.073813 |
|---------------------------|
|
1
53
1.860326 |
|
0
70
1.314626 |
|
1
74
1.790454 |
|
0
75
1.106202 |
|
1
86
2.900706 |
+---------------------------+

10

We find it informative to graphically plot the distribution of BLUPS from random effects in
comparison to BLUEs, or fixed effects in class we call BLUEs shifts.
There are many ways to estimate fixed effects in Stata. One might be tempted to use
Statas areg command. Areg is short for absord regression, which effectively fits a model
with dummy variables for each school, but doesnt display such results. (Just remember,
this is a pedagogical exercise, fixed effect models are not appropriate for GRT data.)
The (seemingly) appropriate command is
. areg

math cond

pov gender read, absorb(school)

But we get the following output.

Linear regression, absorbing indicators

Number of obs
F( 3,
288)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

311
36.53
0.0000
0.4070
0.3617
39.672

-----------------------------------------------------------------------------math |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------cond | (dropped)
pov | -5.209037
8.679059
-0.60
0.549
-22.29147
11.87339
gender |
2.100189
4.678475
0.45
0.654
-7.108151
11.30853
read |
6.780649
.6553348
10.35
0.000
5.490797
8.070502
_cons | -221.0522
71.67932
-3.08
0.002
-362.134
-79.97044
-------------+---------------------------------------------------------------school |
F(19, 288) =
1.279
0.196
(20 categories)

Notice the desired cond effect is dropped. This is because condition is perfectly collinear
with school indicator variables. The areg model does not permit nested effects. Nor
does the other useful command, xtreg, with the fe option. In other words we cannot
easily drop one school per treatment condition so that we can estimate the cond effect.
What can be done? A good and too frequently overlooked command is anova. Statas
anova command permits nested effects with the vertical bar, such that a|b implies a is
nested in b. Thus, we can type

11

. anova math cond gender pov read school|cond, reg contin(read)

Source |
SS
df
MS
-------------+-----------------------------Model | 311072.476
22
14139.658
Residual | 453283.016
288 1573.89936
-------------+-----------------------------Total | 764355.492
310 2465.66288

Number of obs
F( 22,
288)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

311
8.98
0.0000
0.4070
0.3617
39.672

-----------------------------------------------------------------------------math
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-----------------------------------------------------------------------------_cons
-210.2625
72.07997
-2.92
0.004
-352.1328
-68.39213
cond
1
-14.8864
13.96549
-1.07
0.287
-42.37376
12.60097
2
(dropped)
gender
1
-2.100189
4.678475
-0.45
0.654
-11.30853
7.108151
2
(dropped)
pov
1
5.209037
8.679059
0.60
0.549
-11.87339
22.29147
2
(dropped)
read
6.780649
.6553348
10.35
0.000
5.490797
8.070502
school|cond
2 1
10.41479
12.52372
0.83
0.406
-14.23484
35.06442
3 2
-2.270355
12.77133
-0.18
0.859
-27.40734
22.86663
12 1
4.428507
15.92129
0.28
0.781
-26.90833
35.76534
19 2
-3.225934
14.31962
-0.23
0.822
-31.41031
24.95845
23 1
15.7866
14.7453
1.07
0.285
-13.23562
44.80883
24 2
-31.24932
13.06868
-2.39
0.017
-56.97156
-5.527088
25 1
-12.05313
14.34601
-0.84
0.402
-40.28945
16.18319
27 2
-17.45405
14.76243
-1.18
0.238
-46.50998
11.60188
31 1
-29.38767
14.33302
-2.05
0.041
-57.59843
-1.176906
32 2
-17.99994
16.72715
-1.08
0.283
-50.9229
14.92301
35 1
-4.355241
14.42814
-0.30
0.763
-32.75321
24.04273
41 2
-20.88286
14.12454
-1.48
0.140
-48.68328
6.917559
43 1
-2.261484
13.08057
-0.17
0.863
-28.00713
23.48416
44 2
-6.041949
14.3659
-0.42
0.674
-34.31742
22.23352
45 1
-13.80596
13.81846
-1.00
0.319
-41.00395
13.39203
53 2
1.074164
16.61647
0.06
0.949
-31.63095
33.77928
70 1
4.113912
15.55775
0.26
0.792
-26.5074
34.73522
74 2
-2.264692
14.62513
-0.15
0.877
-31.05039
26.521
75 1
(dropped)
86 2
(dropped)
------------------------------------------------------------------------------

This is correct but there seems to be no easy and general way to extract the school-level
fixed effects for further manipulation. Recall, the point of this exercise is to extract fixed
school effects to compare to the (proper) random school effects.

12

Incidentially, to estimate the model in SAS we type and get same thing
proc mixed data=grt.postonly_x;
class school cond;
model math = cond pov gender read school(cond) / solution;
run;

The Mixed Procedure


Model Information
Data Set
Dependent Variable
Covariance Structure
Estimation Method
Residual Variance Method
Fixed Effects SE Method
Degrees of Freedom Method

GRT.POSTONLY_X
MATH
Diagonal
REML
Profile
Model-Based
Residual

(output omitted)

Effect
Intercept
COND
COND
POV
GENDER
READ
SCHOOL(COND)
SCHOOL(COND)
SCHOOL(COND)
SCHOOL(COND)
SCHOOL(COND)
SCHOOL(COND)
SCHOOL(COND)
SCHOOL(COND)
SCHOOL(COND)
SCHOOL(COND)
SCHOOL(COND)
SCHOOL(COND)
SCHOOL(COND)
SCHOOL(COND)
SCHOOL(COND)
SCHOOL(COND)
SCHOOL(COND)
SCHOOL(COND)
SCHOOL(COND)
SCHOOL(COND)

SCHOOL

COND

0
1

2
12
23
25
31
35
43
45
70
75
3
19
24
27
32
41
44
53
74
86

0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1

Estimate

Standard
Error

DF

t Value

Pr > |t|

-207.15
-14.8864
0
-5.2090
2.1002
6.7806
10.4148
4.4285
15.7866
-12.0531
-29.3877
-4.3552
-2.2615
-13.8060
4.1139
0
-2.2704
-3.2259
-31.2493
-17.4541
-17.9999
-20.8829
-6.0419
1.0742
-2.2647
0

72.3880
13.9655
.
8.6791
4.6785
0.6553
12.5237
15.9213
14.7453
14.3460
14.3330
14.4281
13.0806
13.8185
15.5577
.
12.7713
14.3196
13.0687
14.7624
16.7271
14.1245
14.3659
16.6165
14.6251
.

288
288
.
288
288
288
288
288
288
288
288
288
288
288
288
.
288
288
288
288
288
288
288
288
288
.

-2.86
-1.07
.
-0.60
0.45
10.35
0.83
0.28
1.07
-0.84
-2.05
-0.30
-0.17
-1.00
0.26
.
-0.18
-0.23
-2.39
-1.18
-1.08
-1.48
-0.42
0.06
-0.15
.

0.0045
0.2873
.
0.5489
0.6538
<.0001
0.4063
0.7811
0.2852
0.4015
0.0412
0.7630
0.8629
0.3186
0.7916
.
0.8590
0.8219
0.0174
0.2381
0.2828
0.1404
0.6744
0.9485
0.8770
.

13

Okay, back to Stata. We want to construct a simple (0,1) dummy variable for school, and
not exlude any school from this process. It is convenient to use the xi command for
doing this. But lets do a little work to make things work out nice later.
xi, noomit i.cond*i.school

The result of this is fine, but a little confusing since every interaction between condition
and school is generated as a dummy. Since our schools are nested in condition, we
dont have this structure. To make things more visually appealing, lets drop all of newly
generated interaction terms that are not legitimate. To see which one, tabulate school by
condition: the combinations with 0 subjects are not legitimate (eg, school 2, cond 1).
. tab school cond
|
cond
school |
0
1 |
Total
-----------+----------------------+---------2 |
25
0 |
25
3 |
0
25 |
25
12 |
10
0 |
10
19 |
0
15 |
15
23 |
13
0 |
13
24 |
0
22 |
22
25 |
14
0 |
14
27 |
0
14 |
14
31 |
14
0 |
14
32 |
0
9 |
9
35 |
14
0 |
14
41 |
0
16 |
16
43 |
21
0 |
21
44 |
0
15 |
15
45 |
16
0 |
16
53 |
0
9 |
9
70 |
11
0 |
11
74 |
0
14 |
14
75 |
18
0 |
18
86 |
0
16 |
16
-----------+----------------------+---------Total |
156
155 |
311

Now drop the useless variables.


drop

conXsch_0_3 conXsch_0_19 conXsch_0_24 conXsch_0_27 conXsch_0_32 conXsch_0_41


conXsch_0_44 conXsch_0_53 conXsch_0_74 conXsch_0_86

drop

conXsch_1_2 conXsch_1_12 conXsch_1_23 conXsch_1_25 conXsch_1_31 conXsch_1_35


conXsch_1_43 conXsch_1_45 conXsch_1_70 conXsch_1_75

14

Now lets fit regular fixed-effect regression model with a fixed effect for school, but we
leave out one school per condition as reference. Any school per condition will do, but
lets follow lead of the anova analysis above and omit schools 75 and 86.
. reg math cond gender pov read

conXsch_0_2- conXsch_0_70

Source |
SS
df
MS
-------------+-----------------------------Model | 311072.476
22
14139.658
Residual | 453283.016
288 1573.89936
-------------+-----------------------------Total | 764355.492
310 2465.66288

conXsch_1_3- conXsch_1_74

Number of obs
F( 22,
288)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

311
8.98
0.0000
0.4070
0.3617
39.672

-----------------------------------------------------------------------------math |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------cond |
14.8864
13.96549
1.07
0.287
-12.60097
42.37376
gender |
2.100189
4.678475
0.45
0.654
-7.108151
11.30853
pov | -5.209037
8.679059
-0.60
0.549
-22.29147
11.87339
read |
6.780649
.6553348
10.35
0.000
5.490797
8.070502
conXsch_0_2 |
10.41479
12.52372
0.83
0.406
-14.23484
35.06442
conXsch_0_12 |
4.428507
15.92129
0.28
0.781
-26.90833
35.76534
conXsch_0_23 |
15.7866
14.7453
1.07
0.285
-13.23562
44.80883
conXsch_0_25 | -12.05313
14.34601
-0.84
0.402
-40.28945
16.18319
conXsch_0_31 | -29.38767
14.33302
-2.05
0.041
-57.59843
-1.176906
conXsch_0_35 | -4.355241
14.42814
-0.30
0.763
-32.75321
24.04273
conXsch_0_43 | -2.261484
13.08057
-0.17
0.863
-28.00713
23.48416
conXsch_0_45 | -13.80596
13.81846
-1.00
0.319
-41.00395
13.39203
conXsch_0_70 |
4.113912
15.55775
0.26
0.792
-26.5074
34.73522
conXsch_1_3 | -2.270355
12.77133
-0.18
0.859
-27.40734
22.86663
conXsch_1_19 | -3.225934
14.31962
-0.23
0.822
-31.41031
24.95845
conXsch_1_24 | -31.24932
13.06868
-2.39
0.017
-56.97156
-5.527088
conXsch_1_27 | -17.45405
14.76243
-1.18
0.238
-46.50998
11.60188
conXsch_1_32 | -17.99994
16.72715
-1.08
0.283
-50.9229
14.92301
conXsch_1_41 | -20.88286
14.12454
-1.48
0.140
-48.68328
6.917559
conXsch_1_44 | -6.041949
14.3659
-0.42
0.674
-34.31742
22.23352
conXsch_1_53 |
1.074164
16.61647
0.06
0.949
-31.63095
33.77928
conXsch_1_74 | -2.264692
14.62513
-0.15
0.877
-31.05039
26.521
_cons |
-222.04
72.31614
-3.07
0.002
-364.3752
-79.70485
------------------------------------------------------------------------------

Notice this result is same as anova above, except for estimated intercept, which we
ignore anyway.
Now we have to do some work to extract and re-merge school-level fixed effects. And
remember, the estimated school-specific coefficients are differences with respect to the
omitted reference school, per condition, so we have to extract condition means too.
parmest , saving(temp, replace)
preserve
use temp, clear
gen str3 id = substr(parm, -2,2)
replace id = subinstr(id, "_" ,"0" , 1)
keep est id
destring id, force replace
keep if id ~= .
save fe, replace
restore
sort school
merge school using fe
drop _merge

15

mean fe
replace
gen fe2
replace

if single, over(cond)
fe = 0 if fe == .
= fe - _coef[fe:0] if cond == 0
fe2 = fe - _coef[fe:1] if cond == 1

Lets take a look in tabular fashion.


. sort cond school
. list cond school re fe2 if single, noobs sepby(cond)

+---------------------------------------+
| cond
school
re
fe2 |
|---------------------------------------|
|
0
2
4.464902
13.12676 |
|
0
12
1.03236
7.140474 |
|
0
23
4.210702
18.49857 |
|
0
25
-1.866368
-9.341164 |
|
0
31
-6.018359
-26.6757 |
|
0
35
-.8096194
-1.643273 |
|
0
43
-.3606333
.4504827 |
|
0
45
-3.073813
-11.094 |
|
0
70
1.314626
6.82588 |
|
0
75
1.106202
2.711967 |
|---------------------------------------|
|
1
3
2.849438
7.761139 |
|
1
19
1.836211
6.805561 |
|
1
24
-7.145421
-21.21783 |
|
1
27
-1.431592
-7.422559 |
|
1
32
-1.188469
-7.968448 |
|
1
41
-3.062465
-10.85137 |
|
1
44
1.590811
3.989546 |
|
1
53
1.860326
11.10566 |
|
1
74
1.790454
7.766802 |
|
1
86
2.900706
10.0315 |
+---------------------------------------+

We find the dotplot graphing command most useful for visualing the two histograms
together. The following code does this, and adds a few bells-and-whistles for
appearance.

label var re "RE Model"


label var fe "FE Model"
#delimit ;
dotplot re fe if single, recast(scatter)
ytitle(Shifts & Bumps, size(medsmall))
ylabel(-35(5)20, angle(horizontal))
ymlabel(0, tposition(inside) tlength(1) nolabels)
title(Comparative Histograms, margin(bottom))
subtitle("Estimated school effects, N=20", size(medsmall) margin(bottom)
linegap(2)) name(re_fe_plot, replace) scheme(sj)
#delimit cr

16

Comparative Histograms
Estimated school effects, N=20
20
15
10
Shifts & Bumps

5
0
-5
-10
-15
-20
-25
-30
-35
RE Model

FE Model

17

Das könnte Ihnen auch gefallen