Sie sind auf Seite 1von 8

Statistics Texas MBA Practice Problems

1
1.1
Suppose we are modeling house price as depending on house size, the number of bedrooms in the house and the number of bathrooms in the house. Price is measured in thousands of dollars and size is measured in thousands of square feet. Suppose our model is: P = 20 + 50 size + 10 nbed + 15 nbath + , N (0, 102 ).

(a)

Suppose you know that a house has size =1.6, nbed = 3, and nbath =2. What is the conditional distribution of its price given the values for size, nbed, and nbath. (hint: it is normal with mean = ?? and variance = ??) 20 + 50*1.6 + 10*3+ 15*2 = 160 P = 160 + N (160, 102 ). (b)

Given the values for the explanatory variables from part (a), give the 95% predictive interval for the price of the house. 160 +/- 20.

(c)

Suppose you know that a house has size =2.6, nbed = 4, and nbath =3. Give the 95% predictive interval for the price of the house. 20 + 50*2.6 + 10*4+ 15*3 = 235 P = 235 + 235 20. (d)

In our model the slope for the variable nbath is 15. What are the units of this number? Thousands of dollars per bathrooms. (e)

What are the units of the intercept 20? Thousands of dollars (same as y). (f)

What are the units of the the error standard deviation 10? Thousands of dollars (same as y).

2
2.1
(a)

Estimates and Plug-in Prediction

Get the MidCity.txt data from the webpage. In whatever software you want, reproduce the multiple regression results for the regression of price on size, number of bathrooms, and number of bedrooms given in the notes. To get the same numbers you will have to transform the price and size variables by dividing by 1000 so that we are working with thousands of square feet and dollars. Remember, if you do this in excel with the standard add-in (/tools/Data Analysis/Regression) you will have to have all the columns for the three x variables (size, nbed, and nbath) beside each other. I found it easiest to just create 4 columns (size/1000, nbed, nbath, price/1000) all beside each other. So, the rst three columns are my xs and the fourth column is my y. Then, when you get into excel and it asks you for the xs you give it the range of rst value of the rst x and the last value of the last x. Remember to split the screen so you can see all your data! Another useful thing to remember for running the regression in excel is that if you have variable labels in the top row (you should) then you can use these labels in your regression output by (i) including the labels in the variable ranges (eg. y is d1:d129 instead of d2:d129) and then clicking the Labels box right below Input X range (at least on my version of excel). Using plug-in prediction, what is your 95% predictive interval for the price of a house which has size = 1.6, nbed = 3, and nbath =2?

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5.641 17.200 -0.328 0.743504 size 35.643 10.667 3.341 0.001102 ** nbed 10.460 2.912 3.592 0.000472 *** nbath 13.546 4.219 3.211 0.001685 ** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 20.36 on 124 degrees of freedom Multiple R-squared: 0.4396,Adjusted R-squared: 0.426 3

F-statistic: 32.42 on 3 and 124 DF, = 20.36.

p-value: 1.535e-15

-5.641 + 35.643*1.6 + 10.46*3 + 13.546*2 = 109.8598 interval is 109.9 +/- 40.72 (b)

Using plug-in prediction, what is your 95% predictive interval for the price of a house which has size = 2.6, nbed = 3, and nbath =2?

-5.641 + 35.643*2.6 + 10.46*3 + 13.546*2 = 145.5028 145.5 +/- 40.72.

2.2
For this problem us the data is the le Prots.txt. There are 18 observations. Each observation corresponds to a project developed by a rm. y = Prot: prot on the project in thousands of dollars. x1= RD: expenditure on research and development for the project in thousands of dollars. x2=Risk: a measure of risk assigned to the project at the outset.

We want to see how prot on a project relates to research and development expenditure nad risk. (a)

Plot prot vs. each of the two x variables. That is, do two plots y vs. x1 and y vs x2.

You cant really understand the full three-dimensional relationship from these two plots, but it is still a good idea to look at them. Does it seem like the y is related to the xs?

q q

500

500

400

400

pf$PROFIT

pf$PROFIT

300

300

q q

200

q q q q q q

200

q q q q qq q q q

100

100
q

8 pf$RISK

10

60

80

100 pf$RD

120

140

There certainly seems to be a relationship. (b)

Suppose a project has risk=7 and research and development = 76 Give the 95% plug-in predictive interval for the prot on the project. Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -453.1763 23.5061 -19.279 5.37e-12 *** RISK 29.3090 3.6686 7.989 8.76e-07 *** RD 4.5100 0.1538 29.333 1.16e-14 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 14.34 on 15 degrees of freedom Multiple R-squared: 0.9915,Adjusted R-squared: 0.9904 F-statistic: 879.1 on 2 and 15 DF, p-value: 2.852e-16 5

-453.1763 + 29.31*7 + 4.51*76 = 94.7537 = 14.34. interval is 94.7537 +/- 28.68. (c)

Suppose all you knew was risk=7. Run the simple linear regression of prot on risk and get the 95% predictive interval for prot. Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -489.53 173.63 -2.819 0.012339 * RISK 90.45 22.33 4.051 0.000928 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 106.1 on 16 degrees of freedom Multiple R-squared: 0.5063,Adjusted R-squared: 0.4754 F-statistic: 16.41 on 1 and 16 DF, p-value: 0.000928 -489.53 + 90.45*7 = 143.62 143.62 +/- 212.2. (d)

How does the size of your interval in (c) compare with the size of your interval in (b)? What does this tell us about our variables? The interval is much bigger with only RISK in the model. This just means our estimate of is much smaller with RISK and RD than with just RISK. The variable RD adds quite a bit to our preditive ability. 6

3
3.1

Condence Intervals and Hypothesis Tests

In the regression of house price on size, nbath, and nbed, what is the 95% condence interval for the true slope for size? Is it big or small? 35.643 +/- 21.3 It is big.

3.2
Typically people will say that a coecient is not statistically signicant if the t-value for testing whether the coecient is equal to 0 is less than 2. Otherwise, we say the are statistically signicant. Remember however, that you are not allowed to accept a null hypothesis, only fail to reject. Are any of the coecients for the x variables in the zagat regression statistically insignicant? Are any of the coecients in the housing regression of price on size, nbed, and nbath statistically insignicant? All the ts are bigger than 2 (in absolute value) so all the variables are signicant.

3.3
Consider the following regression model (where we assume knowledge of all parameters): Y = 3 + 2X + where N (0, 1)

Be precise and answer the following questions:

(i) What is the mean of Y when X = 1? When X = 1, Y =3+21+ and therefore Y N (5, 1). So the mean of Y when X = 1 is 5. (ii) What is the variance of Y when X = 3? When X = 3, Y =3+23+ and therefore Y N (9, 1). So the variance of Y when X = 3 is 1. (iii) Compute the P r(3 < Y < 7) when X = 1? From (i) we know that when X = 1, Y N (5, 1). The interval 3 < Y < 7 is therefore 2 standard deviations from the mean of Y ... hence P r(3 < Y < 7) = 0.95 (iv) Compute the P r(Y > 12) when X = 3? From (ii) we know that when X = 3, Y N (9, 1). 12 is 3 standard deviations greater than the mean and therefore P r(Y > 12) = 0.005 (v) Construct a 68% prediction interval for Y when X = 0. When X = 0, Y N (3, 1). The 68% prediction interval is 3 1 1 = [2, 4]

Das könnte Ihnen auch gefallen