Sie sind auf Seite 1von 15

Question 1

(14 points)

Using data consisting of variables Y, X1, X2 and X3, a few regression models, with Y being
the response variable, were generated whose output is provided below. Also provided is the
pair-wise correlation matrix.
Correlation Matrix
Y
X1

X2

1.00

X1

0.77

1.00

X2

0.38

0.00

1.00

X3

0.21

0.00

0.00

X3

1.00

Model 1: Yi = 0 + 1 X1i + i
SUMMARY OUTPUT
Regression Statistics
Multiple R

0.77

R Square
Adjusted R Square

0.56

Standard Error

3.54

Observations

20.00

ANOVA
df
Regression

SS
1.00

MS

320.00 320.00

Residual

18.00

225.80

Total

19.00

545.80

Coefficients
Intercept
X1

Standard
Error

F
25.51

12.54

t Stat

P-value

18.90

0.79

23.86

0.00

4.00

0.79

5.05

0.00

Significance F
0.00

Model 2: Yi = 0 + 1 X2i + i

SUMMARY OUTPUT
Regression Statistics
Multiple R

0.38

R Square
Adjusted R Square

0.10

Standard Error

5.09

Observations

20.00

ANOVA
df
Regression

SS

MS

F
3.09

1.00

80.00

80.00

Residual

18.00

465.80

25.88

Total

19.00

545.80

Coefficient
s
Standard Error
Intercept
X2

t Stat

Significance F
0.10

P-value

18.90

1.14

16.62

0.00

2.00

1.14

1.76

0.10

MS

F
0.84

Model 3: Yi = 0 + 1 X3i + i

SUMMARY OUTPUT
Regression Statistics
Multiple R

0.21

R Square
Adjusted R Square

-0.01

Standard Error

5.38

Observations

20.00

ANOVA
df
Regression

SS
1.00

24.20

24.20

Residual

18.00

521.60

28.98

Total

19.00

545.80

Coefficients

Standard

t Stat

P-value

Significance F
0.37

Error
Intercept
X3

18.90

1.20

15.70

0.00

1.10

1.20

0.91

0.37

a) If variable X2 is added to Model 1, what will happen to the coefficient of X1? Will it
increase, decrease, remain the same? Explain your answer.
[3 points]

b) If variable X2 is added to Model 1, provide the closest possible estimate on the


Coefficient of Determination for the resulting model. Explain your estimate.
[3 points]

c) If a regression model was obtained by including all three explanatory variables X1, X2
and X3, estimate the standard error of estimate for the resulting model. Provide the
closest possible estimate and explain your answer.
[4 points]

d) If a forward stepwise regression to select variables was performed with the maximum
allowable p value for keeping a variable being 0.05, what will be the most likely
model that will result? Explain your answer.
[4 points]

Question 2 (25 points)


The following table contains sample data on millions gallons of gasoline consumed (Gallons),
the average retail price in cents (RetailPrice), Consumer Price Index (CPI), CPI for public
transportation (CPITrans), number of registered cars in thousands (RegCars), average
mileage of cars in Miles/Gallon (MPG) and disposable income in dollars (DispInc). This data
relates to a state in the US. What factors can be used to explain gasoline consumption?
Some regression outputs (with a few missing values) are provided in subsequent pages. The
pair-wise correlation matrix is also provided.
CPI

CPITrans

RegCars

MPG

DispInc

1962

Year

Gallons RetailPrice
43771

30.64

90.6

87.4

66638

14.37

6271

1963

45246

30.42

91.7

88.5

69842

14.26

6378

1964

47567

30.35

92.9

90.1

72969

14.25

6727

1965

50275

31.15

94.5

91.9

76634

14.15

7027

1966

53312

32.08

97.2

95.2

80106

14.1

7280

1967

55110

33.16

100

100

82367

14.05

7513

1968

58524

33.71

104.2

104.6

85793

13.91

7728

1969

62448

34.84

109.8

112.7

89156

13.75

7891

1970

65784

35.69

116.3

128.5

92095

13.7

8134

1971

69514

36.43

121.3

137.7

96144

13.73

8322

1972

73463

36.13

125.3

143.4

100658

13.67

8562

1973

78011

38.82

133.1

144.8

106119

13.29

9042

1974

74217

52.41

147.7

148

109823

13.65

8867

1975

76457

57.22

161.2

158.6

116679

13.74

8944

1976

78447

59.47

170.5

174.2

115170

13.93

9175

1977

80677

63.07

181.5

182.4

118711

14.15

9381

1978

83233

65.71

195.4

187.8

121717

14.26

9735

1979

80233

87.79

217.4

200.3

125750

14.49

9829

1980

73375

119.1

246.8

251.6

127448

15.32

9722

1981

71718

131.1

272.4

312

129123

15.68

9769

1982

72848

122.2

289.1

346

129500

16.36

9725

1983

73156

115.7

298.4

362.6

131723

16.81

9930

1984

71180

112.9

311.1

385.2

133751

17.8

10419

1985

69450

111.5

322.2

402.8

137308

18.28

10622

1986

71404

85.7

328.4

426.4

140693

18.35

10947

1987

70984

89.7

340.4

441.4

142209

19.26

10976

Correlation Matrix
Gallons

RetailPrice

CPI

CPITrans

RegCars

MPG

Gallons

1.000

RetailPrice

0.525

1.000

CPI

0.543

0.917

1.000

CPITrans

0.469

0.867

0.987

1.000

RegCars

0.807

0.862

0.930

0.887

1.000

MPG

0.162

0.725

0.894

0.935

0.694

1.000

DispInc

0.813

0.808

0.912

0.880

0.990

0.695

Model 1:

Gallonsi = 0 + 1 RetailPricei + i

SUMMARY OUTPUT

Regression Statistics

DispInc

1.000

Multiple R

0.53

R Square

0.28

Adjusted R Square

0.25

Standard Error

10055.36

Observations

26.00

ANOVA
df

SS

MS

Regression

9.15

Residual

0.01

101110215.62

Total

3351556320.62
Coefficients Standard Error

Intercept

t Stat

P-value

56236.32

4162.48

13.51

0.00

171.89

56.83

3.02

0.01

RetailPrice

Model 2:

Significance F

Gallonsi = 0 + 1 RetailPricei + 2 RegCarsi + i

SUMMARY OUTPUT
Regression Statistics
Multiple R

0.87

R Square

0.76

Adjusted R Square

0.74

Standard Error

5876.89

Observations

26.00

ANOVA
df

SS

Regression

MS

1278593598.83 37.02

Significance F
0.00

Residual
Total
Coefficients Standard Error t Stat

P-value

Intercept

10007.44

7151.09

1.40

0.18

RetailPrice

-215.88

65.46

-3.30

0.00

RegCars

0.66

0.10

6.87

0.00

Model 3: Gallonsi = 0 + 1 RetailPricei + 2 RegCarsi + 3 MPGi + 4 DispInci + i

SUMMARY OUTPUT
Regression Statistics
Multiple R

0.99

R Square

0.98

Adjusted R Square

0.98

Standard Error
Observations

1640.38
26.00

ANOVA
df

SS

MS

Regression

823762178.78

306.14

Significance F
0.00

Residual
Total
Coefficient
s
Standard Error
Intercept
RetailPrice
RegCars
MPG
DispInc

t Stat

P-value

54733.17

5188.74

10.55

0.00

-65.34

26.66

-2.45

0.02

0.43

0.15

2.92

0.01

-4883.99

306.39

-15.94

0.00

4.91

2.27

2.17

0.04

a) In Model 1, what amount of the total variation in the response variable is explained
by the variable, RetailPrice? Show your calculation.
[3 points]
b) If Model 1 were to be used to predict the gasoline consumption for the year 1988,
what would be the 95% prediction interval for gasoline consumption if the retail price
for that year is projected to be $0.66 per gallon. Assume that the mean value of retail
price in the data set is $0.645 and the sample standard deviation is $0.353. Show
your work. [5 points]
c) Given the presence of RetailPrice and RegCars in the model does the addition of
explanatory variables MPG and DispInc add significant additional information to the
model? Perform a partial F test to answer this question. Use a significance level =
0.05. [4 points]
d) From a macroeconomic standpoint, disposable income is a factor worth considering
only if for each dollar increase in the average disposable income, the consumption of
gasoline increases by at least 2 million gallons. Given the presence of factors retail

price, number of registered cars and the average mileage of cars, is there sufficient
evidence to suggest that gasoline consumption does increase by at least 2 million
gallons for each dollar increase in average disposable income? Use a significance
level of = 0.05 to answer this question.
[5 points]
e) Would it be worthwhile adding variable CPITrans to Model 3? Why or Why not? [3
points]
f)

As per Model 1, if the price gasoline increases then the consumption also increases.
However, as per Model 2, the effect is the opposite. Which one is really true? How do
you explain this dichotomy? Explain your answer.
[5 points]

Question 3 (25 points)


Below is data on a sample of antique items sold. Each data item lists the price for which it
was sold (in $), the age of the antique piece (in years) and the number of bidders. What are
the factors that can be used to explain the price of an antique item?
auct_pr

age

num_bid

$946

113

$1,336

126

10

$744

115

$1,979

182

11

$1,522

150

$1,235

127

13

$1,483

159

$1,152

117

13

$1,545

175

$1,262

168

$845

127

$1,055

108

14

$1,253

132

10

$1,297

137

$1,147

137

$1,080

115

12

$1,550

182

$1,047

156

$1,792

179

$729

108

$854

143

$1,593

187

$1,175

111

15

$1,713

137

15

$1,356

194

$1,822

156

12

$1,884

162

11

$1,024

117

11

$2,131

170

14

$785

111

$1,092

153

$2,041

184

10

Provided below are the pair-wise correlation matrix some regression outputs

Exhibit I: Correlation Matrix


Correlations
AUCT_PR
AUCT_PR

Pearson
Correlation
Sig. (2-tailed)

Pearson
Correlation
Sig. (2-tailed)

.395(*)

.000

.025

32

32

.730(**)

-.254

.000

.161

32

32

32

.395(*)

-.254

.025

.161

32

32

32

N
NUM_BID

NUM_BID

.730(**)

32

N
AGE

AGE
1

Pearson
Correlation
Sig. (2-tailed)
N

** Correlation is significant at the 0.01 level (2-tailed).


* Correlation is significant at the 0.05 level (2-tailed).

Exhibit II: Model I auct_pr = 0 + 1 age + 2 num_bid + i


Model Summary(b)

Model
1

R Square

.945(a)

.893

Adjusted R
Square

Std. Error of the


Estimate

.885

133.13650

a Predictors: (Constant), NUM_BID, AGE


b Dependent Variable: AUCT_PR
ANOVA(b)

Model
1

Regression
Residual

Sum of
Squares
4277159.70
3

df

514034.515

Total

4791194.21
9
a Predictors: (Constant), NUM_BID, AGE
b Dependent Variable: AUCT_PR

Mean Square
2

2138579.852

29

17725.328

F
120.651

Sig.
.000(a)

31

Coefficients(a)

Model

Unstandardized Coefficients
B

(Constant)

Std. Error

Standardized
Coefficients

Sig.

Beta

-1336.722

173.356

-7.711

.000

AGE

12.736

.902

.888

14.114

.000

NUM_BID

85.815

8.706

.620

9.857

.000

a Dependent Variable: AUCT_PR

Scatterplot
Dependent Variable: AUCT_PR
Regression Standardized Residual

2.0
1.5
1.0
.5
0.0
-.5
-1.0
-1.5
-2.0
-3

-2

-1

Regression Standardized Predicted Value


Exhibit III: Model II auct_pr = 0 + 1 age + 2 num_bid + 3 age_bid + i,
where age_bid = age*num_bid
Model Summary(b)

Model
2

R
.977(a)

R Square

Adjusted R
Square

.954

Std. Error of the


Estimate

.949

88.36738

a Predictors: (Constant), AGE_BID, AGE, NUM_BID


b Dependent Variable: AUCT_PR
ANOVA(b)
Sum of
Squares
df
Regression
4572547.9
87
Residual
218646.23
2
Total
4791194.2
19
a Predictors: (Constant), AGE_BID, AGE, NUM_BID
b Dependent Variable: AUCT_PR
Model
1

Mean Square
3

1524182.662

28

7808.794

F
195.188

Sig.
.000(a)

31

Coefficients(a)

Model

Unstandardized
Coefficients

Standardized
Coefficients

Sig.

B
1

(Constant)

322.754

AGE

Std. Error

Beta

293.325

1.100

.281

.873

2.020

.061

.432

.669

NUM_BID

-93.410

29.708

-.675

-3.144

.004

AGE_BID

1.298

.211

1.370

6.150

.000

a Dependent Variable: AUCT_PR

Scatterplot
Dependent Variable: AUCT_PR
Regression Standardized Residual

-1

-2
-2

-1

Regression Standardized Predicted Value


a.

Comparing Model I to Model II, which one, in your opinion, is preferable? Give the
pros and cons of each in stating your answer.
[6 points]

b.

In Model II, which of the three variables: age, num_bid and age_bid, has the
greatest impact on auct_pr? State clearly why.
[2 points]

c.

If an antique is 120 years old, what would be the precise impact of number of
bidders on the price as described in Model II? If the antique were only 30 years
old, would this model apply? Why or Why not? [4 points]

d.

If you were to develop a simple linear regression model with auct_pr as the
dependent variable and age as the independent variable, would the model turn
out to be significant, using the F test, at a significance level of = 0.01? Why or
Why not?
[3 points]

Question 4 (23 points)


The pairwise correlation matrix and the regression outputs provided below pertain to data
collected on a sample of 80 countries. The variables on which the data was collected are:
GNPCapita: GNP per capita
PopGrowth: average annual change in population, 1980-1990 {(Pt+1 Pt)/Pt}
Calorie: daily per capita calorie content of food used for domestic consumption
LifeExp: average life expectancy of newborn given current mortality conditions
Fertility: average births per woman given current fertility rates.
The sample means and sample standard deviations for each variable is provided below. Also
provided are the pairwise correlation matrix and some regression outputs as exhibits. Note
that some values have been deleted by design.
GNPCapita:
PopGrowth:
Calorie:
LifeExp:
Fertility:

4119.86, s = 6908.5

x
x
x
x

0.0197,

s = 0.0119

2654.075, s = 534.19
4.21,

s = 1.964

63.45,

s = 10.807

Exhibit I: Correlation Matrix


GNPCapita

PopGrowth

Calorie

LifeExp

GNPCapita

1.000

PopGrowth

-0.562

1.000

Calorie

0.668

-0.667

1.000

LifeExp

0.574

-0.662

0.724

1.000

Fertility

-0.600

0.829

-0.752

-0.899

Fertility

1.000

Exhibit II:
Model I: LifeExp = 0 + 1 Fertilityi + i
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations

4.762
80.000

ANOVA
df

SS

MS

Significance

F
Regression
Residual
Total

9227.800
Coefficients

Standard Error

t Stat

P-value

Intercept

84.274

1.266

66.592

0.000

Fertility

-4.946

0.273

-18.137

0.000

Exhibit III
Model II: LifeExp = 0 + 1 Fertilityi + 2 PopGrowth + 3 Calorie + i
SUMMARY OUTPUT
Regression Statistics
Multiple R

0.916

R Square

0.839

Adjusted R Square

0.832

Standard Error
Observations

80.000

ANOVA
df

SS

MS

Regression

2579.686 131.692

Residual
Total
Coefficients
Intercept

Standard Error

t Stat

P-value

74.829

5.129

14.589

259.221

75.398

3.438

Calorie

0.003

0.001

1.962

0.053

Fertility

-5.677

0.516

-10.995

0.000

PopGrowth

Exhibit IV

0.000

Significance F
0.000

Model III: GNPCapita = 0 + 1 Fertility + 2 PopGrowth + 3 Calorie +


i

4LifeExp +

SUMMARY OUTPUT
Regression Statistics
Multiple R

0.691

R Square

0.478

Adjusted R Square

0.450

Standard Error
Observations

5125.104
80.000

ANOVA
df
Regression

SS

MS

4.000 1800470510.893 450117627.723

Residual

75.000 1970001976.594

Total

79.000 3770472487.488
Coefficients

Intercept
PopGrowth
Calorie
Fertility
LifeExp

Standard Error

Significance F

17.136

26266693.021

t Stat

P-value

-16075.810

11578.790

-1.388

0.169

-105278.350

93853.829

-1.122

0.266

6.011

1.690

3.557

0.001

105.470

962.337

0.110

0.913

92.550

132.829

0.697

0.488

a.

In Model I, what amount of the total variation in Life Expectancy is explained by


the variation in Fertility? Show your calculations.
[5 points]

b.

Determine the missing F value and its corresponding Significance F for Model I.
Show your calculations.
[3 points]

c.

Consider the country of Costa Rica. Its average Fertility is 3.0. Using Model I can
we conclude that Cost Ricas average life expectancy is above 50, based on a
significance level of = 0.05? Reason out your answer carefully by showing all
calculations.
[6 points]

d.

Is Model IIs predictive power significantly better than that of Model I? Perform an
appropriate statistical procedure to answer this question. Show your steps.
[5 points]

0.000

e.

Using the information provided in the outputs above, determine the VIF (Variance
Inflation Factor) associated with the variable LifeExp in Model III? Show your
computations.
[5 points]

Question 5 (8 points)
Highly publicized salaries of corporate chief executive officers (CEOs) in the United States
have generated sustained interest in understanding factors related to CEO compensation in
general. Data on the annual compensation of the CEOs of 167 financial companies is culled
out from a larger dataset that appeared in Forbes magazine. Each year, Forbes magazine
publishes data giving the compensation package of 800 top CEOs in the US including those
of financial companies.
The variable of interest is total compensation, defined as the sum of salary plus any
bonuses, including stock options. How does one explain the wide variation in total
compensation of CEOs in the same industry? Some of the variables considered that might
help explain the variation in salary are:
MBA?: is equal to 1 if the CEO has an MBA, 0 otherwise
MasterPhD?: is equal to 1 if the CEO has a masters or PhD degree, 0 otherwise.
YearsFirm(Yrs): Total number of years the CEO worked for the company for which he/she is
currently CEO
YearsCEO(Yrs): Total number of years the CEO has served the company as a CEO
StockOwned(%): % of company stock owned by the CEO
Sales(millions of $): Annual sales of the company in millions of dollars
Profits(millions of $): Annual profits of the company in millions of dollars
A regression model is built using SPSS with the Forward method giving it a choice of
choosing any of the above listed variables. The output obtained at the first step is as follows:
Table 1: Model Summaryb
Model
R

Adjusted R

Std. Error of the

Square

Estimate

R Square

.115

Durbin-Watson

4406976.255

2.031

en

sio

n0

a. Predictors: (Constant), Profits(in 000000 of $)


b. Dependent Variable: TotalComp($)

Table 2: Coefficientsa
Model

Standardized
Unstandardized Coefficients

Coefficients
Partial
Correlati

B
1

(Constant)
Profits(in 000000

Std. Error

1182515.538

404938.707

4090.826

859.264

of $)
a. Dependent Variable: TotalComp($)

Beta

.347

Sig.

2.920

.004

4.761

.000

Correlation

.347

on

.347

Table 3: Excluded Variablesb


Model

Collinearity
Statistics
Beta In

Sig.

Partial Correlation

Tolerance

MBA?

-.016a

-.018

.999

MasterPhD?

-.065

-.069

.992

YearsFirm(Yrs)

-.087a

-.092

.989

YearsCEO(Yrs)

.024

.026

.982

StockOwned(%)

.021a

.022

.981

-.070

.174

Sales(millions of $)

-.157

a. Predictors in the Model: (Constant), Profits(in 000000 of $)


b. Dependent Variable: TotalComp($)

Using the information given above answer the followng questions:


a) The correlation and partial correlations given for Profit in Table 2 are the same.
Explain why. (1 point)
b) What is the proportion of explained variation compared to the total variation for the model
above?
(2 points)
c) Which variable should be chosen as the next candidate to enter the regression
equation? Why?
(1 point)
d) What does the collinearlity statistics for Sales(millions of $) given in Table 3 in the
output mean? State clearly the steps needed to calculate it.
(2
points
e) Are successive observations in the data used for the regression independent or not?
Explain?
(1 point)
f)

What is the impact on the compensation of CEOs who do not have an MBA? (1 point)

Das könnte Ihnen auch gefallen