Bootstrap Solutions

An Introduction to Bootstrap Methods and their Application Statoo Consulting, Switzerland WBL in Angewandter Statistik 2011/13, ETHZ
L osungen zur Serie 3
6. Februar 2012
Aufgabe 1. (a) > require(boot) > dogs mvo lvp 1 78 32 2 92 33 3 116 45 4 90 30 5 106 38 6 78 24 7 99 44 > plot(dogs)
45
q q
40
lvp
35
q q
30
25
q
80
90 mvo
100
110
Die Daten sehen nicht normalverteilt aus. > cor(dogs$mvo, dogs$lvp) [1] 0.8536946 (b) Die folgende Funktion dogs.gen.boot erzeugt unter H0 einen Bootstrap Datensatz wie auf der Folie 115 (Unterkapitel 3.6.3) beschrieben ist. Die Idee ist, dass wir die zwei Variablen voneinander trennen und von jeder Variablen unabh angig eine Bootstrap Stichprobe erzeugen, und dann diese wieder zusammenf ugen zu einem neuen Datensatz auf dem wir dann die Korrelation berechnen werden. 1
dogs.gen.boot = function(data, mle) { index1 = sample(mle, size=mle, replace=TRUE) index2 = sample(mle, size=mle, replace=TRUE) cbind(data[index1, 1], data[index2, 2]) } Ein Beispiel eines solchen Bootstrap Datensatzes ist: > set.seed(31) > dogs.gen.boot(dogs, 7) [,1] [,2] [1,] 90 33 [2,] 99 44 [3,] 116 33 [4,] 116 45 [5,] 99 24 [6,] 90 38 [7,] 78 32 Der nichtparametrische Bootstraptest kann dann wie folgt durchgef uhrt werden: dogs.fun = function(data) {cor(data[, 1], data[, 2])} set.seed(31) dogs.boot = boot(dogs, dogs.fun, R=999, sim="parametric", ran.gen=dogs.gen.boot, mle=nrow(dogs)) Von den R = 999 Bootstrap-Werten sind 5 t = 0.854 und somit ist der p-Wert (1 + 5)/(999 + 1) = 0.006. Wir verwerfen somit H0 , d.h. wir verwerfen die Hypothese, dass MVO und LVP unabh angig sind. > sum(dogs.boot$t >= dogs.boot$t0) [1] 5 > (1 + sum(dogs.boot$t >= dogs.boot$t0))/(1 + dogs.boot$R) [1] 0.006
Aufgabe 2. (a) Das Bootstrap-Modell unter der Nullhypothese ist, dass die Dierenzen dj unabh angige Realisationen einer Verteilung sind, die symmetrisch bez uglich null ist. Somit nimmt eine BootstrapSchema unabh angig n Werte aus {d1 , . . . , dn } an. Der resultierende p-Wert sollte somit nicht stark von demjenigen des Randomisierungstests abweichen. (b) > require(boot) > darwin y 1 49 2 -67 3 8 4 16 5 6 6 23 7 28 2
8 41 9 14 10 29 11 56 12 24 13 75 14 60 15 -48 Der Randomisierungstest kann wie folgt durchgef uhrt werden: > darwin.gen = function(data, mle) { sign = sample(c(-1, 1), mle, replace=TRUE) data*sign } > set.seed(32) > darwin.rand = boot(darwin$y, mean, R=999, sim="parametric", ran.gen=darwin.gen, mle=nrow(darwin)) > (1 + sum(darwin.rand$t >= darwin.rand$t0))/(1 + darwin.rand$R) [1] 0.023 Der Randomisierungstest mit R = 999 liefert einen p-Wert von 0.023. Die Nullhypothese wird somit verworfen und somit scheint die Dierenz leicht signikant von null verschieden zu sein. (c) Der Bootstraptest kann wie folgt durchgef uhrt werden: > darwin.gen.boot = function(data, mle) { sample(mle, size=length(data), replace=TRUE) } > set.seed(32) > darwin.boot = boot(darwin$y, mean, R=999, sim="parametric", ran.gen=darwin.gen.boot, mle=c(darwin$y, -darwin$y)) > (1 + sum(darwin.boot$t >= darwin.boot$t0))/(1 + darwin.boot$R) [1] 0.026 Der Bootstraptest mit R = 999 liefert einen p-Wert von 0.026, welcher nicht stark von demjenigen des Randomisierungstests (0.023) abweicht; siehe auch Bermekungen in (a). Wir kommen somit zum gleichen Schluss.
Aufgabe 3. (a) > require(boot) > catsM Sex Bwt Hwt 1 M 2.0 6.5 3
2 ... 96 97
M 2.0
6.5
M 3.9 14.4 M 3.9 20.5
> plot(catsM$Bwt, catsM$Hwt, xlim=c(0, 4), ylim=c(0, 24)) > cats.lm = lm(Hwt ~ Bwt, data=catsM) > abline(coef(cats.lm))
20
q q q q q q q q qq q q q q q qq qqq q q q q q q q q qq qqq q q qqqqqqqqq q q q q q q qqqqq q q qq qq qqqqqq q qq qqq qq q q q q qqq q q q
catsM$Hwt
0 0
10
15
2 catsM$Bwt
> summary(cats.lm) Call: lm(formula = Hwt ~ Bwt, data = catsM) Residuals: Min 1Q Median -3.7728 -1.0478 -0.2976
3Q 0.9835
Max 4.8646
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.1841 0.9983 -1.186 0.239 Bwt 4.3127 0.3399 12.688 <2e-16 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 1.557 on 95 degrees of freedom Multiple R-Squared: 0.6289, Adjusted R-squared: 0.625 F-statistic: 161 on 1 and 95 DF, p-value: < 2.2e-16 > par(pty="s", mfrow=c(1, 2)) > plot(cats.lm, which=1:2)
Residuals vs Fitted Standardized residuals 3

97 q
Normal QQ
97 q 88 q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
Residuals
qq
2 1
q qq q q q q q q q q qqqq q qq q q q qqqq q q q q qq q qqq q q q qqqqq qq qqq q qq qqqq qqq q q q q qqq qqq qq q qq q q qqq
q q
q q
0
q q 93
93 q
10
12
14
q 88
Fitted values
Theoretical Quantiles
(b) (b1) Case resampling: > cats.fit = function(data) coef(lm(data$Hwt ~ data$Bwt)) > cats.case = function(data, i) cats.fit(data[i, ]) > set.seed(33) > cats.boot1 = boot(catsM, cats.case, R=499) > cats.boot1 ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = catsM, statistic = cats.case, R = 499) Bootstrap Statistics : original bias t1* -1.184088 0.06025952 t2* 4.312679 -0.01948728 > plot(cats.boot1)
std. error 1.1570486 0.4114637
Histogram of t
q
0.30
2 t* 4 2 t* 0 2 4 2 0
0.00
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq
Density
0.10
0.20
Quantiles of Standard Normal
> plot(cats.boot1, index=2)
Histogram of t
1.0 5.5
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
0.8
Density
0.6
0.4
t* 0.2 0.0 3.0 4.0 t* 5.0 6.0
3.0
3.5
4.0
4.5
5.0
und die Steigung , index=2) sehen normalverteilt aus. Beide Graphiken (f ur das Intercept 0 1 Man sieht auch, dass es nicht grosse Ausreisser gibt (wie dies der Fall war mit den survival data, welche wir im Kurs auf den Folien 125130 im Unterkapitel 3.7 gesehen haben).
(b2) Model-based resampling: > > > > > > cats.res = resid(cats.lm) cats.sd = sqrt(sum(cats.res^2)/cats.lm$df.residual) cats.res = cats.res*cats.sd cats.res = cats.res - mean(cats.res) cats.df = data.frame(catsM, res=cats.res, fit=fitted(cats.lm)) cats.model = function(data, i) { 6
d = data d$Hwt = d$fit + d$res[i] cats.fit(d) } > set.seed(33) > cats.boot2 = boot(cats.df, cats.model, R=499) > cats.boot2 ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = cats.df, statistic = cats.model, R = 499) Bootstrap Statistics : original bias t1* -1.184088 0.11056619 t2* 4.312679 -0.03527925 > plot(cats.boot2)
std. error 1.6166324 0.5510013
Histogram of t
qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq
0.4
Density
0.2
t* 4 2 t* 0 1 2 4 2
0.0
1 0
> plot(cats.boot2, index=2)
Histogram of t
qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
1.2
Density
0.8
t* 0.4 0.0 3.0 4.0 t* 5.0
3.5
4.0
4.5
5.0
1 0
Beide Bootstrap Verfahren liefern ahnliche Sch atzungen von den Koezienten der Regressionsgeraden.
Dr. Diego Kuonen, CStat PStat CSci <kuonen@statoo.com>
Statoo Consulting, Switzerland (www.statoo.info)
Copyright c 2012, Statoo Consulting, Switzerland. All rights reserved.

Bootstrap Solutions

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Bootstrap Solutions

Hochgeladen von

Copyright:

Verfügbare Formate

An Introduction to Bootstrap Methods and their Application Statoo Consulting, Switzerland WBL in Angewandter Statistik 2011/13, ETHZ

L osungen zur Serie 3

M 3.9 14.4 M 3.9 20.5

q q q q q q q q qq q q q q q qq qqq q q q q q q q q qq qqq q q qqqqqqqqq q q q q q q qqqqq q q qq qq qqqqqq q qq qqq qq q q q q qqq q q q

Residuals vs Fitted Standardized residuals 3

std. error 1.1570486 0.4114637

Quantiles of Standard Normal

> plot(cats.boot1, index=2)

t* 0.2 0.0 3.0 4.0 t* 5.0 6.0

Quantiles of Standard Normal

std. error 1.6166324 0.5510013

Quantiles of Standard Normal

> plot(cats.boot2, index=2)

t* 0.4 0.0 3.0 4.0 t* 5.0

Quantiles of Standard Normal

Dr. Diego Kuonen, CStat PStat CSci <kuonen@statoo.com>

Statoo Consulting, Switzerland (www.statoo.info)

Copyright c 2012, Statoo Consulting, Switzerland. All rights reserved.

Das könnte Ihnen auch gefallen