Beruflich Dokumente
Kultur Dokumente
• So far, we’ve only been able to examine the relationship between two
variables.
• In many instances, we believe that more than one independent variable is
correlated with the dependent variable.
• Multiple linear regression provides is a tool that allows us to examine the
relationship between 2 or more regressors and a response variable.
• This technique is especially useful when trying to account for potential
confounding factors in observational studies.
General form of the model
• The model is very similar to the simple linear model from before, with the
addition of other regressor(s).
• As before, we are assuming that the regressors relate to the response
variables linearly.
• We assume the relationship is of the form:
E(Y ) = β0 + β1 X1 + β2 X2 + ... + βk Xk
Consider the case of 2 regressors, which is much easier to visualize than more
complicated cases.
• In the case of 2 regressors we are fitting a plane through our data using the
same least squares strategy we had before.
• The coefficient of each independent variable tells us the estimated change in
the response associated with a one-unit increase in the independent variable,
given that the other independent variable is held constant
• If both the regressors change, then the estimated change in the response
variable (∆Y ) is given by: βˆ1 ∆X1 + βˆ2 ∆X2 .
Example with two regressors
Imagine that we have a random sample of families. For each family we have
information about the annual income, annual savings, and the number of
children. We are interested in how the number of children and the level of income
relate the amount of money saved.
• Our response variable is the annual savings.
• Our regressors are number of children and annual income.
• Our fitted model will take the form:
1
1
1
1 1
4
1
1 1
11
Savings (in thousands of dollars)
2
1 1 2 2
1 1
1 2
3
1
2
2 2
1 1 2
2 2
1 1 2
2 2
2 3
12 1
1 1 2 3
2
1 2 3
2 3
2 3
2 3 3 3 3
3
2
2 2 3
3
2 3
1
2 3 3 3 One child
3
3 3 Two children
3 3 3
Three children
3
3
3
0
15 20 25 30 35
• In the last example, the coefficient of income remains the same, regardless of
the number of kids in the family.
• What if you think that there’s an interaction between income and children?
(That is, you think the effect is not strictly additive.)
• You might to choose to fit the model with an interaction effect, in which case
you are modeling:
• This allows the coefficient associated with income to change based on the
number of children
• This sort of model is still linear, because the unknowns (the βs) are linear in
their relationship to the knowns (income, children, income*children).
Fitted lines with interaction term
1
1
1
1 1
4
1
1 1
11
Savings (in thousands of dollars)
2
1 1 2 2
1 1
1 2
3
1
2
2 2
1 1 2
2 2
1 1 2
2 2
2 3
12 1
1 1 2 3
2
1 2 3
2 3
2 3
2 3 3 3 3
3
2
2 2 3
3
2 3
1
2 3 3 3 One child
3
3 3 Two children
3 3 3
Three children
3
3
3
0
15 20 25 30 35
Imagine that we have a data set for a sample of families, including annual
income, annual savings, and whether the familiy is has a single breadwinner
(“1”) or not (“0”).
Fit the model: mean savings = β0 + β1 ∗ income + β2 ∗ oneearn.
Assume we obtain the fitted regression equation:
How does being a one breadwinner family affect the estimated average savings?
Interpretation of dummy variables (cont.)
Re-examine how being a one breadwinner family affects the estimated average
savings.
Inference concerning mult. regression coefficients
• The estimates β̂i and their standard errors can be found on the output from
a statistical package like S-Plus. Also, the fitted regression equation is
sometimes presented with the standard errors listed under each estimate in
parentheses.:
ŷ = βˆ0 + βˆ1 x1 + βˆ2 x2
(SEβˆ0 ) (SEβˆ1 ) (SEβˆ2 )
Adjusted/corrected R2
2(n − 1)R2 − k
Adjusted R =
n−k−1
Example: Hiring salaries
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) 8.7161 0.1116 78.1167 0.0000
age 0.0002 0.0001 2.7001 0.0083
educatn 0.0170 0.0045 3.8012 0.0003
seniorty -0.0041 0.0009 -4.3281 0.0000
gender -0.1439 0.0216 -6.6626 0.0000