01 05 PDF

Statistical linear regression models
Brian Caffo, Jeff Leek, Roger Peng

Johns Hopkins Bloomberg School of Public Health
Basic regression model with additive Gaussian

errors.
Leastsquaresisanestimationtool,howdowedoinference?
Considerdevelopingaprobabilisticmodelforlinearregression
Yi = 0 + 1 Xi + i
Herethei areassumediidN(0, 2 ) .
Note,E[ Yi | Xi = x i ] = i = 0 + 1 xi
Note,Var( Yi |Xi = xi ) = 2 .
LikelihoodequivalentmodelspecificationisthattheYi areindependentN( i , 2 ) .
2/14
Likelihood
n
(, ) =
(2 2 )1/2 exp
i=1
1
(yi i )2
2
)}
2
sothatthetwicethenegativelog(basee)likelihoodis
1
2 log{(, )} = 2
(yi i )2 + n log( 2 )
i=1
Discussion
Maximizingthelikelihoodisthesameasminimizing2loglikelihood
The least squares estimate for i = 0 + 1 xi is exactly the maximimum likelihood estimate
(regardlessof )
3/14
Recap
ModelYi = i + i = 0 + 1 Xi + i wherei areiidN(0, 2 )
MLestimatesof 0 and 1 aretheleastsquaresestimates
Sd(Y)
1 = Cor(Y, X)
0 = Y 1 X
Sd(X)
E[Y|X = x] = 0 + 1 x
Var(Y|X = x) = 2
4/14
Interpretting regression coefficients, the itc

0 istheexpectedvalueoftheresponsewhenthepredictoris0
E[Y|X = 0] = 0 + 1 0 = 0
Note,thisisn'talwaysofinterest,forexamplewhenX = 0 isimpossibleorfaroutsideoftherange
ofdata.(Xisbloodpressure,orheightetc.)
Considerthat
Yi = 0 + 1 Xi + i = 0 + a 1 + 1 (Xi a) + i = 0 + 1 (Xi a) + i
So,shiftingyouX valuesbyvalueachangestheintercept,butnottheslope.
Often aissetto Xsothattheinterceptisinterprettedastheexpectedresponseattheaverage X
value.
5/14
Interpretting regression coefficients, the slope

1 istheexpectedchangeinresponsefora1unitchangeinthepredictor
E[Y|X = x + 1] E[Y|X = x] = 0 + 1 (x + 1) ( 0 + 1 x) = 1
ConsidertheimpactofchangingtheunitsofX .
Yi = 0 + 1 Xi + i = 0 +
1
(Xi a) + i = 0 + 1 (Xi a) + i
a
Therefore,multiplicationofX byafactoraresultsindividingthecoefficientbyafactorofa.
Example: X is height in m and Y is weight in kg. Then 1 is kg/m. Converting X to cm implies
multiplying X by 100cm/m .Toget 1 intherightunits,wehavetodivideby 100cm/m togetitto
havetherightunits.
Xm
1
kg
kg
100cm
1m
= (100X)cmand 1
=
( 100 ) cm
m
m 100cm
6/14
Using regression coeficients for prediction

Ifwewouldliketoguesstheoutcomeataparticularvalueofthepredictor,say X ,theregression
modelguesses
0 + 1 X
NotethatattheobservedvalueofX s,weobtainthepredictions
i = Y i = 0 + 1 Xi
Rememberthatleastsquaresminimizes
n
(Yi i )
i=1
fori expressedaspointsonaline
7/14
Example
diamonddata set from UsingR
Data is diamond prices (Signapore dollars) and diamond weight in carats (standard measure of
diamondmass,0.2g).Togetthedatauselibrary(UsingR); data(diamond)
Plottingthefittedregressionlineanddata
data(diamond)
plot(diamond$carat, diamond$price,
xlab = "Mass (carats)",
ylab = "Price (SIN $)",
bg = "lightblue",
col = "black", cex = 1.1, pch = 21,frame = FALSE)
abline(lm(price ~ carat, data = diamond), lwd = 2)
8/14
The plot
9/14
Fitting the linear regression model

fit <- lm(price ~ carat, data = diamond)
coef(fit)
(Intercept)
-259.6
carat
3721.0
Weestimateanexpected3721.02(SIN)dollarincreaseinpriceforeverycaratincreaseinmassof
diamond.
Theintercept259.63istheexpectedpriceofa0caratdiamond.
10/14
Getting a more interpretable intercept

fit2 <- lm(price ~ I(carat - mean(carat)), data = diamond)
coef(fit2)
(Intercept) I(carat - mean(carat))

500.1
3721.0
Thus$500.1istheexpectedpricefortheaveragesizeddiamondofthedata(0.2042carats).
11/14
Changing scale
Aonecaratincreaseinadiamondisprettybig,whataboutchangingunitsto1/10thofacarat?
Wecanjustdothisbyjustdividingthecoeficientby10.
Weexpecta372.102(SIN)dollarchangeinpriceforevery1/10thofacaratincreaseinmass
ofdiamond.
Showingthatit'sthesameifwerescaletheXsandrefit
fit3 <- lm(price ~ I(carat * 10), data = diamond)
coef(fit3)
(Intercept) I(carat * 10)

-259.6
372.1
12/14
Predicting the price of a diamond

newx <- c(0.16, 0.27, 0.34)
coef(fit)[1] + coef(fit)[2] * newx
[1] 335.7 745.1 1005.5
predict(fit, newdata = data.frame(carat = newx))
1
2
3
335.7 745.1 1005.5
13/14
PredictedvaluesattheobservedXs(red)andatthenewXs(lines)
14/14

01 05 PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

01 05 PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Statistical linear regression models

Brian Caffo, Jeff Leek, Roger Peng

Basic regression model with additive Gaussian

Interpretting regression coefficients, the itc

Interpretting regression coefficients, the slope

Using regression coeficients for prediction

Fitting the linear regression model

Getting a more interpretable intercept

(Intercept) I(carat - mean(carat))

(Intercept) I(carat * 10)

Predicting the price of a diamond

[1] 335.7 745.1 1005.5

predict(fit, newdata = data.frame(carat = newx))

Das könnte Ihnen auch gefallen