Sie sind auf Seite 1von 7

1

UNIVERSITY OF CAPE TOWN


DEPARTMENT OF STATISTICAL SCIENCES
STA2020F: BUSINESS STATISTICS
Test 2

Internal examiners: Hannah Gerber Date: 24 April 2011
Total number of questions: 1 Time: 1 hour 30 minutes
Total number of pages: 14 (6 + 1 + 7) Total marks: 50
Instructions: Answer all questions in the answer book(s) provided. The appropriate tables
and formulae have been provided.





McIntyre (1994) used tar, nicotine and weight as explanatory variables for the carbon
monoxide (CO) content corresponding to 25 brands of cigarettes. Noted in the data set were
the brand, the tar content in mg, the nicotine content in mg, the weight of the cigarette in
grams and the CO content in mg.

The following Excel output is provided: The correlation matrix (Table 1), the regression
output (Table 2), the prediction interval and confidence interval of the expected value for
given regressor values (Table 3), all subset regression output (Tables 4 and 5), the residual
plot of the predicted responses (Figure 1) and the Q-Q plot (Figure 2).

Tar Nicotine Weight CO
Tar 1
Nicotine 0.9766 1
Weight 0.4908 0.5002 1
CO 0.9575 0.9259 0.4640 1

Table 1: Correlation matrix

2


SUMMARY OUTPUT
Regression Statistics
Multiple R 0.9584
R Square 0.9186
Adjusted R Square A
Standard Error 1.4457
Observations 25
ANOVA

df SS MS F
Significance
F
Regression B 495.2578 165.0859 E 1.33E-11
Residual C 43.8926 D 2.0901

Total

539.1504


Coeff
Standard
Error
t Stat P-value Lower 95% Upper 95%
Intercept 3.2022 F 0.9250 0.3655 -3.9969 10.4013
Tar 0.9626 0.2422 G 0.0007 0.4588 1.4663
Nicotine -2.6317 3.9006 -0.6747 0.5072 H I
Weight -0.1305 3.8853 -0.0336 J -8.2105 7.9495

Table 2: Regression output

CO
Predicted value 9.3593
Prediction Interval
Lower limit -0.3348
Upper limit 19.0533
Interval Estimate of Expected Value
Lower limit 0.1432
Upper limit 18.5753

Table 3: Prediction and confidence intervals for cigarettes that weigh 1 mg and have
12mg tar and 2 mg nicotine

3


Subset No.
Summary of best subsets; variable(s): CO (Spreadsheet2)
Adjusted R square and standardized
regression coefficients for each submodel
Adjusted
R square
No. of
Effects
Tar Nicotine Weight
1 0.9132 1 0.9575
2 0.9112 2 1.1505 -0.1977
3 0.9093 2 0.9613 -0.0078
4 0.9070 3 1.1507 -0.1966 -0.0024
5 0.8512 1 0.9259
6 0.8444 2 0.9254 0.0011
7 0.1811 1 0.4640

Table 4: All subsets regression output (adjusted R square)


Subset No.
Summary of best subsets; variable(s): CO (Spreadsheet2)
R square and standardized
regression coefficients for each submodel
R square No. of
Effects
Tar Nicotine Weight
1 0.9186 3 1.1507 -0.1966 -0.0024
2 0.9186 2 1.1505 -0.1977
3 0.9169 2 0.9613 -0.0078
4 0.9168 1 0.9575
5 0.8574 2 0.9254 0.0011
6 0.8574 1 0.9259
7 0.2153 1 0.4640

Table 5: All subsets regression output (R square)


Figure 1: Residual plot of the predicted response
-4
-3
-2
-1
0
1
2
3
0 5 10 15 20 25 30
R
e
s
i
d
u
a
l
s
Predicted response
Residual plot
4


Figure 2: Q-Q plot

1. State the fitted regression model (Table 2) and then estimate the CO content of a
cigarette that weighs 1 g, has a tar content of 12mg and a nicotine content of 2 mg.
(2)
2. Does the model have a good fit? Justify your answer using the appropriate hypothesis
test. Ensure that your variables are clearly defined. (4)
3. Using p-values, test if the nicotine variable is significant in the model. (4)
4. Interpret the coefficient of the nicotine variable. (3)
5. Interpret the prediction and confidence intervals (Table 3). (5)
6. State and interpret the correlation coefficient between the nicotine variable and the
response variable. (2)
7. Fill in the missing values from A to J. (8)
8. Consider the all subsets regression:
a) State the explanatory variable(s) you would incorporate into your model if the
coefficient of determination was the criterion? Justify your answer. (2)
b) State the explanatory variable(s) you would incorporate into your model if the
adjusted coefficient of determination was the criterion? Justify your answer.
(2)
c) What is the difference between the coefficient of determination and the
adjusted coefficient of determination? Justify the need for this measure.
(2)
9. If you were constructing a model using forward stepwise regression, which variable
would you incorporate first? Justify your answer. (2)
10. If you were constructing the model using backward stepwise regression which
variable would you remove first (if any)? Justify your answer. (2)
11. State the regression assumptions and mention at least one method of assessing if each
assumption is satisfied. (8)
12. Comment on the residual plot. (2)
13. Comment of the q-q plot. (2)
-3
-2
-1
0
1
2
3
-4 -3 -2 -1 0 1 2 3
E
x
p
e
c
t
e
d

n
o
r
m
a
l

v
a
l
u
e
s
Residuals
Q-Q plot
5

IMPORTANT FORMULAE

SIMPLE LINEAR REGRESSION

Measures of association:

x
= (

- )(

- )
n
=1
=

n
=1
-
_

n
=1
_

n
=1
n

x
= (

- )
2
n
=1
=

2
n
=1
-
(_

n
=1
)
2
n

= (

- )
2
n
=1
=

2
n
=1
-
(_

n
=1
)
2
n

r
x
=

x



Testing the correlation coefficient:

= r_
n-2
1-
2
with uf = n -

OLS estimation:

b
1
=
_ (

- )(

- )
n
=1
_ (

- )
2 n
=1
=

x

x
=
cov(X, )
s
x
2


b
0
= - b
1


Standard error of estimate:

s
s
=
_

n -


Prediction interval:

_ o
2
,n-2
s
s
_
+
1
n
+
(x
g
-x )
2
(n-1)s
x
2



6

Confidence interval:

_ u
2
,n-2
s
s
_

n
+
(
g
- )
2
(n - )s
x
2



MULTIPLE REGRESSION

Multiple regression model (for the sake of defining and ):

_

2
.

n
_ =
l
l
l
l
X
1,1
X
1,2
X
2,1
X
2,2

X
1,p-1
X
2,p-1
. . . .
X
n,1
X
n,2
X
n,p-11
1
1
1
_
[
0
[
1
.
[
p-1
_ + _
e
1
e
2
.
e
n
_

Parameter estimate:

[

= (X
i
X)
-1
X
i


Residual:

e = -
`


Variance of regression coefficient estimate:

([

]
) = s
2
(X
i
X)
(]+1)(]+1)
-1


Mean squared error:

s
2
=
e
i
e
n - p


Sum of squares:

= [

i
X
i
- n

2
= (
`

)
2
n
=1

= - [

i
X
i
= (

-
`

)
2
n
=1

=
i
-n

2
= (

)
2
n
=1



7

Multiple coefficient of determination:

2
=

= -



Adjusted multiple coefficient of determination:

ud]
2
= -
n -
n - p



Test statistics:

[

]
-[
]
s_
(X
i
X)
(]+1)(]+1)
-1

n-p

=
H
H

p-1,n-p

=
[

2
s
2
(X
i
X)
(+1)(+1)
-1
=

1,n-p


Confidence intervals:

[

]
_
n-p,
u
2
s_
(X
i
X)
(]+1)(]+1)
-1

_
n-p,
u
2
s_

i
(X
i
X)
-1

_
n-p,
u
2
s_ +

i
(X
i
X)
-1

Das könnte Ihnen auch gefallen