Differences Between PROC Mixed and Lme - Lmer in R - Degrees of Freedom - Cross Validated

_
Cross Validated is a question and Here's how it works:
answer site for people interested in
statistics, machine learning, data
analysis, data mining, and data
visualization. Join them; it only takes a
minute:
Anybody can ask Anybody can The best answers are voted
a question answer up and rise to the top
Sign up
Differences between PROC Mixed and lme / lmer in R degrees of freedom
Note : this question is a repost, as my previous question had to be deleted for legal reasons.
While comparing PROC MIXED from SAS with the function lme from the nlme package in R, I stumbled upon some rather confusing differences. More

specifically, the degrees of freedom in the different tests differ between PROC MIXED and lme , and I wondered why.
Start from the following dataset (R code given below) :
ind : factor indicating the individual where the measurement is taken
fac : organ where measurement is taken
trt : factor indicating the treatment
y : some continuous response variable
The idea is to build the following simple models :
y ~ trt + (ind) : ind as a random factor y ~ trt + (fac(ind)) : fac nested in ind as a random factor
Note that the last model should cause singularities, as there's only 1 value of y for every combination of ind and fac .
First Model
In SAS, I build the following model :
PROC MIXED data=Data;

CLASS ind fac trt;
MODEL y = trt /s;
RANDOM ind /s;
run;
According to tutorials, the same model in R using nlme should be :
> require(nlme)
> options(contrasts=c(factor="contr.SAS",ordered="contr.poly"))
> m2<-lme(y~trt,random=~1|ind,data=Data)
Both models give the same estimates for the coefficients and their SE, but when carrying out an F test for the effect of trt , they use a different amount of

degrees of freedom :
SAS :
Type 3 Tests of Fixed Effects
Effect Num DF Den DF F Value Pr > F
trt 1 8 0.89 0.3724
R :
> anova(m2)
numDF denDF F-value p-value
(Intercept) 1 8 70.96836 <.0001
trt 1 6 0.89272 0.3812
Question1: What is the difference between both tests? Both are fitted using REML, and use the same contrasts.
NOTE: I tried different values for the DDFM= option (including BETWITHIN, which theoretically should give the same results as lme)
Second Model
In SAS :

CLASS ind fac trt;
MODEL y = trt /s;
RANDOM fac(ind) /s;
run;
The equivalent model in R should be :
> m4<-lme(y~trt,random=~1|ind/fac,data=Data)
In this case, there are some very odd differences :
R fits without complaining, whereas SAS notes that the final hessian is not positive definite (which doesn't surprise me a bit, see above)
The SE on the coefficients differ (is smaller in SAS)
Again, the F test used a different amount of DF (in fact, in SAS that amount = 0)
SAS output :
Effect trt Estimate Std Error DF t Value Pr > |t|

Intercept 0.8863 0.1192 14 7.43 <.0001
trt Cont -0.1788 0.1686 0 -1.06 .
R Output :
> summary(m4)
...
Fixed effects: y ~ trt
Value Std.Error DF t-value p-value
(Intercept) 0.88625 0.1337743 8 6.624963 0.0002
trtCont -0.17875 0.1891855 6 -0.944840 0.3812
...
(Note that in this case, the F and T test are equivalent and use the same DF.)
Interestingly, when using lme4 in R the model doesn't even fit :
> require(lme4)
> m4r <- lmer(y~trt+(1|ind/fac),data=Data)
Error in function (fr, FL, start, REML, verbose) :
Number of levels of a grouping factor for the random effects
must be less than the number of observations
Question 2: What is the difference between these models with nested factors? Are they specified correctly and if so, how comes the results are so
different?
Simulated Data in R :
Data <- structure(list(y = c(1.05, 0.86, 1.02, 1.14, 0.68, 1.05, 0.22,
1.07, 0.46, 0.65, 0.41, 0.82, 0.6, 0.49, 0.68, 1.55), ind = structure(c(1L,
2L, 3L, 1L, 3L, 4L, 4L, 2L, 5L, 6L, 7L, 8L, 6L, 5L, 7L, 8L), .Label = c("1",
"2", "3", "4", "5", "6", "7", "8"), class = "factor"), fac = structure(c(1L,
1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L), .Label = c("l",
"r"), class = "factor"), trt = structure(c(2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Cont",
"Treat"), class = "factor")), .Names = c("y", "ind", "fac", "trt"
), row.names = c(NA, -16L), class = "data.frame")
Simulated Data :
y ind fac trt

1.05 1 l Treat
0.86 2 l Treat
1.02 3 l Treat
1.14 1 r Treat
0.68 3 r Treat
1.05 4 l Treat
0.22 4 r Treat
1.07 2 r Treat
0.46 5 r Cont
0.65 6 l Cont
0.41 7 l Cont
0.82 8 l Cont
0.60 6 r Cont
0.49 5 l Cont
0.68 7 r Cont
1.55 8 r Cont
r mixedmodel sas degreesoffreedom
edited Jun 1 '12 at 20:08 asked Jun 1 '12 at 9:57
Joris Meys
4,250 1 22 38
@Aaron : Please find your answer included in this post. If you could copy and paste that as an answer, I give you the rep for it. It has been very helpful, so I really want to keep it here on
crossvalidated. After you've done that, I delete your answer from the question. – Joris Meys Jun 1 '12 at 9:58
I'm trying to make the team revive your original Q with this unfortunate revision wiped out for good thus there is a great chance to restore original answers and merge them here. – mbq
Jun 1 '12 at 12:25
@mbq : That would be nice, although I simulated some data (which I use here) and edited the answer of Aaron accordingly. For the other answer, that's going to be a bit more complicated,
but I can try as well. – Joris Meys Jun 1 '12 at 12:58
Aaron's answer is an incredibly good one. I hope they see it. Unfortunately, your @Aaron won't contact him unless he's participated in this thread. – Wayne Jun 1 '12 at 18:45
1 Yes this was a nice answer. Here I gave a link to the deleted post : stats.stackexchange.com/questions/26556/… I'm going to add the link to the present post. – Stéphane Laurent Jun 1 '12
at 19:00
1 Answer
For the first question, the default method in SAS to find the df is not very smart; it looks for terms in the
random effect that syntactically include the fixed effect, and uses that. In this case, since trt is not
found in ind , it's not doing the right thing. I've never tried BETWITHIN and don't know the details, but
either the Satterthwaite option ( satterth ) or using ind*trt as the random effect give correct results.

CLASS ind fac trt;
MODEL y = trt /s ddfm=satterth;
RANDOM ind /s;
run;

CLASS ind fac trt;
MODEL y = trt /s;
RANDOM ind*trt /s;
run;
As for the second question, your SAS code doesn't quite match your R code; it only has a term for
fac*ind , while the R code has a term for both ind and fac*ind . (See the Variance Components
output to see this.) Adding this gives the same SE for trt in all models in both Q1 and Q2 (0.1892).
As you note, this is an odd model to fit as the fac*ind term has one observation for each level, so is

equivalent to the error term. This is reflected in the SAS output, where the fac*ind term has zero
variance. This is also what the error message from lme4 is telling you; the reason for the error is that
you most likely misspecified something as you're including the error term in the model in two different
ways. Interestingly, there is one slight difference in the nlme model; it's somehow finding a variance
term for the fac*ind term in addition to the error term, but you will notice that the sum of these two
variances equal the error term from both SAS and nlme without the fac*ind term. However, the SE
for trt remains the same (0.1892) as trt is nested in ind , so these lower variance terms don't
affect it.
Finally, a general note about degrees of freedom in these models: They are computed after the model
is fit, and so differences in the degrees of freedom between different programs or options of a program
do not necessarily mean that the model is being fit differently. For that, one must look at the estimates
of the parameters, both fixed effect parameters and covariance parameters.
Also, using the t and F approximations with a given number of degrees of freedom is fairly
controversial. Not only are there several ways to approximate the df, some believe the practice of doing
so is not a good idea anyway. A couple words of advice:
1. If everything is balanced, compare the results with the traditional least squares method, as they
should agree. If it's close to balanced, compute them yourself (assuming balance) so that you can
make sure the ones you're using are in the right ballpark.
2. If you have a large sample size, the degrees of freedom don't matter very much as the
distributions get close to a normal and chisquared.
3. Check out Doug Bates's methods for inference. His older method is based on MCMC simulation;
his newer method is based on profiling the likelihood.
answered Jun 1 '12 at 19:06
Aaron
5,660 24 42
Indeed a good answer, although I think that profiling the likelihood solves a different question (appropriate CIs on variance
parameters where the profile is nonquadratic) than doing MCMC simulation (which handles both finitesize correction and
nonquadraticness). I think bootMer (parametric bootstrap) is closer to the equivalent for mcmcsamp than confint(profile(...))
... – Ben Bolker Jun 3 '12 at 15:33
@BenBolker: Sure could be. Doug Bates gave a talk here last month and he talked about his ideas on profiling the likelihood.
That's about all I know about it so far. – Aaron Jun 3 '12 at 18:39

Differences Between PROC Mixed and Lme - Lmer in R - Degrees of Freedom - Cross Validated

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Differences Between PROC Mixed and Lme - Lmer in R - Degrees of Freedom - Cross Validated

Hochgeladen von

Copyright:

Verfügbare Formate

While comparing PROC MIXED from SAS with the function lme from the nlme package in R, I stumbled upon some rather confusing differences. More

Note that the last model should cause singularities, as there's only 1 value of y for every combination of ind and fac .

PROC MIXED data=Data;

According to tutorials, the same model in R using nlme should be :

Both models give the same estimates for the coefficients and their SE, but when carrying out an F test for the effect of trt , they use a different amount of

PROC MIXED data=Data;

Effect trt Estimate Std Error DF t Value Pr > |t|

Interestingly, when using lme4 in R the model doesn't even fit :

y ind fac trt

r mixedmodel sas degreesoffreedom

PROC MIXED data=Data;

PROC MIXED data=Data;

As you note, this is an odd model to fit as the fac*ind term has one observation for each level, so is

Das könnte Ihnen auch gefallen

Differences Between PROC Mixed and Lme - Lmer in R - Degrees of Freedom - Cross Validated

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Differences Between PROC Mixed and Lme - Lmer in R - Degrees of Freedom - Cross Validated

Hochgeladen von

Copyright:

Verfügbare Formate

While comparing PROC MIXED from SAS with the function lme from the nlme package in R, I stumbled upon some rather confusing differences. More

Note that the last model should cause singularities, as there's only 1 value of y for every combination of ind and fac .

PROC MIXED data=Data;

According to tutorials, the same model in R using nlme should be :

Both models give the same estimates for the coefficients and their SE, but when carrying out an F test for the effect of trt , they use a different amount of

PROC MIXED data=Data;

Effect trt Estimate Std Error DF t Value Pr > |t|

Interestingly, when using lme4 in R the model doesn't even fit :

y ind fac trt

r mixed­model sas degrees­of­freedom

PROC MIXED data=Data;

PROC MIXED data=Data;

As you note, this is an odd model to fit as the fac*ind term has one observation for each level, so is

Das könnte Ihnen auch gefallen

r mixedmodel sas degreesoffreedom