Sie sind auf Seite 1von 14

Statistical linear regression models

Brian Caffo, Jeff Leek, Roger Peng


Johns Hopkins Bloomberg School of Public Health

Basic regression model with additive Gaussian


errors.
Leastsquaresisanestimationtool,howdowedoinference?
Considerdevelopingaprobabilisticmodelforlinearregression

Yi = 0 + 1 Xi + i
Herethei areassumediidN(0, 2 ) .
Note,E[ Yi | Xi = x i ] = i = 0 + 1 xi
Note,Var( Yi |Xi = xi ) = 2 .
LikelihoodequivalentmodelspecificationisthattheYi areindependentN( i , 2 ) .

2/14

Likelihood
n

(, ) =

(2 2 )1/2 exp

i=1

1
(yi i )2
2
)}
2

sothatthetwicethenegativelog(basee)likelihoodis

1
2 log{(, )} = 2

(yi i )2 + n log( 2 )

i=1

Discussion
Maximizingthelikelihoodisthesameasminimizing2loglikelihood
The least squares estimate for i = 0 + 1 xi is exactly the maximimum likelihood estimate
(regardlessof )

3/14

Recap
ModelYi = i + i = 0 + 1 Xi + i wherei areiidN(0, 2 )
MLestimatesof 0 and 1 aretheleastsquaresestimates

Sd(Y)
1 = Cor(Y, X)
0 = Y 1 X
Sd(X)
E[Y|X = x] = 0 + 1 x
Var(Y|X = x) = 2

4/14

Interpretting regression coefficients, the itc


0 istheexpectedvalueoftheresponsewhenthepredictoris0

E[Y|X = 0] = 0 + 1 0 = 0
Note,thisisn'talwaysofinterest,forexamplewhenX = 0 isimpossibleorfaroutsideoftherange
ofdata.(Xisbloodpressure,orheightetc.)
Considerthat

Yi = 0 + 1 Xi + i = 0 + a 1 + 1 (Xi a) + i = 0 + 1 (Xi a) + i
So,shiftingyouX valuesbyvalueachangestheintercept,butnottheslope.
Often aissetto Xsothattheinterceptisinterprettedastheexpectedresponseattheaverage X
value.

5/14

Interpretting regression coefficients, the slope


1 istheexpectedchangeinresponsefora1unitchangeinthepredictor

E[Y|X = x + 1] E[Y|X = x] = 0 + 1 (x + 1) ( 0 + 1 x) = 1
ConsidertheimpactofchangingtheunitsofX .

Yi = 0 + 1 Xi + i = 0 +

1
(Xi a) + i = 0 + 1 (Xi a) + i
a

Therefore,multiplicationofX byafactoraresultsindividingthecoefficientbyafactorofa.
Example: X is height in m and Y is weight in kg. Then 1 is kg/m. Converting X to cm implies
multiplying X by 100cm/m .Toget 1 intherightunits,wehavetodivideby 100cm/m togetitto
havetherightunits.

Xm

1
kg
kg
100cm
1m
= (100X)cmand 1

=
( 100 ) cm
m
m 100cm

6/14

Using regression coeficients for prediction


Ifwewouldliketoguesstheoutcomeataparticularvalueofthepredictor,say X ,theregression
modelguesses

0 + 1 X
NotethatattheobservedvalueofX s,weobtainthepredictions

i = Y i = 0 + 1 Xi
Rememberthatleastsquaresminimizes
n

(Yi i )

i=1

fori expressedaspointsonaline

7/14

Example
diamonddata set from UsingR
Data is diamond prices (Signapore dollars) and diamond weight in carats (standard measure of
diamondmass,0.2g).Togetthedatauselibrary(UsingR); data(diamond)
Plottingthefittedregressionlineanddata
data(diamond)
plot(diamond$carat, diamond$price,
xlab = "Mass (carats)",
ylab = "Price (SIN $)",
bg = "lightblue",
col = "black", cex = 1.1, pch = 21,frame = FALSE)
abline(lm(price ~ carat, data = diamond), lwd = 2)

8/14

The plot

9/14

Fitting the linear regression model


fit <- lm(price ~ carat, data = diamond)
coef(fit)

(Intercept)
-259.6

carat
3721.0

Weestimateanexpected3721.02(SIN)dollarincreaseinpriceforeverycaratincreaseinmassof
diamond.
Theintercept259.63istheexpectedpriceofa0caratdiamond.

10/14

Getting a more interpretable intercept


fit2 <- lm(price ~ I(carat - mean(carat)), data = diamond)
coef(fit2)

(Intercept) I(carat - mean(carat))


500.1
3721.0

Thus$500.1istheexpectedpricefortheaveragesizeddiamondofthedata(0.2042carats).

11/14

Changing scale
Aonecaratincreaseinadiamondisprettybig,whataboutchangingunitsto1/10thofacarat?
Wecanjustdothisbyjustdividingthecoeficientby10.
Weexpecta372.102(SIN)dollarchangeinpriceforevery1/10thofacaratincreaseinmass
ofdiamond.
Showingthatit'sthesameifwerescaletheXsandrefit
fit3 <- lm(price ~ I(carat * 10), data = diamond)
coef(fit3)

(Intercept) I(carat * 10)


-259.6
372.1

12/14

Predicting the price of a diamond


newx <- c(0.16, 0.27, 0.34)
coef(fit)[1] + coef(fit)[2] * newx

[1] 335.7 745.1 1005.5

predict(fit, newdata = data.frame(carat = newx))

1
2
3
335.7 745.1 1005.5

13/14

PredictedvaluesattheobservedXs(red)andatthenewXs(lines)

14/14

Das könnte Ihnen auch gefallen