Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11
ACTL2002/ACTL5101 Probability and Statistics

c Katja Ignatieva

School of Risk and Actuarial Studies
Australian School of Business
University of New South Wales
k.ignatieva@unsw.edu.au
Week 11
Week 2
Week 3
Week 4
Probability:
Week 6
Review
Estimation: Week 5
Week
7
Week
8
Week 9
Hypothesis testing:
Week
10
Week
12
Linear regression:
Week 2 VL
Week 3 VL
Week 4 VL
Video lectures: Week 1 VL
Week 1
Week 5 VL
Last ten weeks

Introduction to probability;
Moments: (non)-central moments, mean, variance (standard
deviation), skewness & kurtosis;
Special univariate (parametric) distributions (discrete &
continue);
Joint distributions;
Convergence; with applications LLN & CLT;
Estimators (MME, MLE, and Bayesian);
Evaluation of estimators;
Interval estimation.
3201/3252
Final two weeks

Simple linear regression:
-
Idea;
Estimating using LSE (& BLUE estimator & relation MLE);
Partition of variability of the variable;
Testing:
i) Slope;
ii) Intercept;
iii) Regression line;
iv) Correlation coefficient.
Multiple linear regression:

3202/3252
Matrix notation;
LSE estimates;
Tests;
R-squared and adjusted R-squared.

Matrix notation
Linear Algebra and Matrix Approach
Multiple Linear regression

Matrix notation
The Model in Matrix Form
Linear models
Statistical Properties of the Least Squares Estimates

CI and Tests for Individual Regression Parameters
CI and Tests for functions of Regression Parameters
Example: Multiple Linear Regression

Example regression output
Exercise: Multiple Linear Regression
Appendix
Simple linear regression in matrix form

Matrix notation

In general we will consider multiple regression problem:
y = 0 + 1 x 1 + 2 x 2 + . . . + p1 x p1
and data points:
y1 x11 x12 . . . x1,p1
y2 x21 x22 . . . x2,p1
..
..
.. . .
..
.
.
.
.
.
yn xn1 xn2 . . . xn,p1
3203/3252

Matrix notation
Multiple Regression: Linear Algebra and Matrix Approach

Observations yi are written in a vector y .
Regression coefficients are the vector (p by 1)
= [0 , 1 , . . . , p1 ]> where > indicates transpose ( a
column vector).
The matrix X (size n by p) is:
1 x11 x12 . . . x1,p1

1 x21 x22 . . . x2,p1
X= .
..
.. . .
..
..
.
.
.
.
1 xn1 xn2 . . . xn,p1
Predicted values are:
yb = X.
3204/3252

Matrix notation
Multiple Regression: Linear Algebra and Matrix Approach

Least squares problem is to select to minimize:

>

S = y X
y X .
Proof: see next slides.
Differentiate with respect to each of the 0 s and the normal
equations become:
b = X> y .
X> X
If X> X is non-singular then the parameter estimates are:

1
b = X> X
X> y .
The residuals are:
b
= y yb = y X.
3205/3252

Matrix notation
The least squares problem is to find the vector that minimizes:

S
n
X
2i
i=1
n
X
n
X
(yi ybi )2
i=1
(yi 0 1 xi1 . . . p1 xip1 )2
i=1
y X
>

y X .
Derivation of least squares estimator:

>
>

>
>
> >
y X
y X =
0=
y y 2 X y
+ X X

>
= 2X> y + X> X + X> X
= 2X> y + 2X> X

1
X> y =X> X = X> X
X> y .
3206/3252

Matrix notation
The Least Squares Estimates

Differentiating this matrix w.r.t. and equating equal to zero
leads:
X> X = X> Y ,
1
i.e., the normal equations. If X> X
exists, the solution is:

1
b = X> X
X> Y .
The corresponding vector of fitted (or predicted) values of y is:
b = X
b
Y
and the vector of residuals:
b = Y X
b
b = Y Y
gives the differences between the observed and fitted values.
3207/3252

Matrix notation

Matrix notation
Linear models


Appendix

Matrix notation

Consider the regression model of the form:
y = 0 + 1 x1 + . . . + p1 xp1 + .
Fitted to data, the model becomes:
yi = 0 + 1 xi1 + . . . + p1 xip1 + i ,
Define the vectors:
y1
0
y2
1
Y = . , = .
.
..
.
[n1]
[p1]
yn
p1
3208/3252
for i = 1, 2, . . . , n.
, and
[n1]

1
2
..
.
n

Matrix notation

Together with the matrix:
1 x11 . . . x1,p1
1 x21 . . . x2,p1
X = .
.. . .
..
..
[np]
.
.
.
1 xn1 . . . xn,p1
Write the model in matrix form as follows:
+ .
Y = X
n1
[np]
[p1]
[n1]
The fitted value is:

b = X
Y
[n1]
3209/3252
[np]
b .
[p1]

Matrix notation
Linear models

Matrix notation
Linear models


Appendix

Matrix notation
Linear models
Introduction
To apply linear regression properly:
Effects of the covariates (explanatory variables) must be
additive;
Homoskedastic (constant) variance (otherwise use
AutoRegressive Conditional Heteroscedasticity model (ARCH)
model, from Robert Engle; 2003 Nobel prize for Economics);
Errors must be independent of the explanatory variables with
mean zero (weak assumptions);
Errors must be Normally distributed, and hence, symmetric
(only in case of testing, i.e., strong assumptions).
3210/3252

Matrix notation
Linear models
Linear models in general

A linear model involves a response variable datum, yi , treated
as an observation on a random variable, (Yi |X = x), where
E[Yi |X = x] i , the i s are zero mean random variables
independent of X , and the i s are model parameters, the
values of which are unknown and need to be estimated using
data.
The following are examples of linear models:
- Affine form: i = 0 + xi 1 ;
- Polynomial (cubic) form: i = 0 + xi 1 + xi2 2 + xi3 3 ;
- Affine form with interaction terms:
i = 0 + xi 1 + zi 2 + (xi zi )3 .
For all linear forms we have: Yi = i + i .

3211/3252

Matrix notation
Linear models
Linear models
The first model
1
2
..
.
can be re-written in matrix-vector form as:

1 x1
1 x2

1 x3 0
= [1n X ] .
=
.. .. 1
. .
1 xn
n
| {z }
X
So model has general form = X, i.e., the expected value

vector is given by a model matrix (or design matrix), X,
multiplied by a parameter vector, .
All linear models can be written in this general form.
3212/3252

Matrix notation
Linear models
Linear models
The second model (the cubic) given above can
matrix-vector form as:

1
1 x1 x12 x13
2 1 x 2 x 2 x 3
2
2

3 1 x 3 x 2 x 3
3
3
=
.. .. ..
..
..
. . .
.
.
2
n
1 xn xn xn3
{z
}
|
X
3213/3252
be written in
0
1
.
2
3

Matrix notation
Linear models
Models in which data are divided into different groups, each

of which are assumed to have a different mean, are less
obviously of the form = X, but they can be written like
this using dummy variables.
Consider the model:
yi = j + i if observation i is in group j,
and suppose there are three groups, each with two data. Then
the model can be re-written:
y1
1 0 0
y2 1 0 0
y3 0 1 0 0
y4 = 0 1 0 1 + .

2
y5 0 0 1
y6
|
3214/3252
0 0 1
{z
}
X

Matrix notation
Linear models
Marginal effects
Assume that we have the multiple regression model of the
form:
y = 0 + 1 x1 + . . . + p1 xp1 + .
Assume that xk is a continuous variable so that if we increase
it by one unit while holding the values of the other variables
fixed, the value of y becomes:
y new = 0 + 1 x1 + . . . + k (xk +1) + . . . + p1 xp1 + .
Since E [] = 0, then the marginal effect of xk is:
k = E [y new ] E [y ] ,
is therefore the expected increase (or decrease) in the value of
y whenever you increase the value of xk by one unit.
3215/3252


Matrix notation
Linear models


Appendix

Assumptions
The residuals terms i satisfy the following:
E [i |X = x] = 0,
Var (i |X = x) = 2 ,
Cov (i , j |X = x) = 0,
for i = 1, 2, . . . , n;
for i = 1, 2, . . . , n;
for all i =
6 j.
In words, the residuals have zero means, common variance,

are uncorrelated with explanatory variables and are
independent of other residuals.
In matrix form, we have:
E [] = 0;
Cov () = 2 In ,
3216/3252
where In is a matrix of size n n with ones on the diagonal

and zeros on the off-diagonal elements.


The following properties of the least squares estimates can be
verified:
h i
b = .
1. The least squares estimates are unbiased: E
2. The variance-covariance
matrix of the least squares estimates

1
2
>
b =
is: Var
b X X .
3. An unbiased estimate of 2 is:
>

1
s2 =
y yb
y yb .
np
Note that:
(n p) S 2
2 (n p),
2
3217/3252
and b and S 2 are independent.


4. Each component bk is normally distributed with mean:
h i
E bk = k ,
and variance:

b2 ckk ,
Var bk =
where ckk is the (k + 1)th diagonal entry of the matrix

1
C = X> X
(because c11 corresponds to the constant) and
covariance between bk and bl :

Cov bk , bl =
b2 ckl ,
3218/3252


Matrix notation
Linear models


Appendix


The standard error of bk is estimated using:

se bk = s ckk .
Under the normality (strong) assumption, we have:
bk k
t (n p) .
se bk
A 100 (1 ) % confidence interval for k is given by:

bk t1/2,np se bk .
3219/3252


In testing the null hypothesis H0 : k = k0 for some fixed constant
k0 , we use the test statistic:
T =
bk k0

se bk
which under the null hypothesis, it has a t-distribution with n p

degrees of freedom. The common test is to test the significance of
the presence of the variable xk , in which case the test statistic
simply becomes:
bk
T = ,
se bk
because we test H0 : k = 0 against H1 : k 6= 0 when we test for
the significance/importance of the variable.
3220/3252


However, we can always have more general tests for the regression
coefficients as demonstrated in the three cases below:
1. Test the null hypothesis:
H0 : k = k0
against the alternative:
H1 : k 6= k0 .
Use the decision rule (using generalized LRT, week 7):

b
k k0

Reject H0 if:
|T | = > t1/2,np .
se bk
3221/3252


2. Test the hypothesis:
H0 : k = k0
v.s.
H1 : k > k0 .
Use the decision rule (using UMP, week 7):

Reject H0 if:
T =
bk k0
> t1,np .
se bk
3. Test the hypothesis:

H0 : k = k0
v.s.
H1 : k < k0 .
Use the decision rule (using UMP, week 7):

Reject H0 if:
3222/3252
T =
bk k0
< t1,np .
se bk


Matrix notation
Linear models


Appendix


Let D be a matrix (size m p) of m linear combinations of the
explanatory variables.
Then we have that:
h i
E Db =D

Var Db =DVar b D> = 2 D(X> X)1 D>
Under the normality (strong) assumption, we have:
D(b )
p
t (n p) .
s 2 D(X> X)1 D>
|
{z
}
=se (Db)
A 100 (1 ) % confidence interval for D is given by:

Db t1/2,np se Db .
3223/3252

Adjusted R-Squared
The coefficient of determination may is:
SSE
SST SSE
=1
.
R2 =
SST
SST
In the simple linear regression model, the R-squared provides a
descriptive measure of the success of the regressor variables in
explaining the variation in the dependent variable.
The R-squared will always increase when adding additional
regressor variables increase even if regressor variables added
do not strongly influence the dependent variable.
An alternative is to correct it for the number of regressor
variables present. Thus, we define adjusted R-squared:
Ra2 = 1
3224/3252

SSE/ (n p)
s2
n1
=1
=1
1 R2 .
SST/ (n 1)
MST
np

Can we test wether the regression explains anything

significant? E.g. can we jointly test wether
[1 , . . . , p1 ]> = 0 (note: excluding 0 )?
Use the F-statistic:
e |
b 2 /(p 1)
SSM/(p 1)
|X
=
Fp1,np .
F =
2
|| /(n p)
SSE/(n p)
e |
b 2 / 2 2 and
Under the strong assumptions |X
p1
2
2
2
e is the
|| / np are chi-squared distributed (note: X
matrix X without the constant).
Interpretation: If the regression model explains a large
proportion of the variability in y , then |X|2 should be large
and ||2 should be small.
Hence, test H0 : = 0 v.s. H1 : at least one k 6= 0.
3225/3252
Reject H0 if F > Fp1,np (1 ).

ANOVA table and sum of squares:

- SST is the total variability in the absence of knowledge of the
variables X1 , . . . , Xp1 ;
- SSE is the total variability remaining after introducing the
effect of X1 , . . . , Xp1 ;
- SSM is the total variability explained because of knowledge
of X1 , . . . , Xp1 .
This partitioning of the variability is used in ANOVA tables:

Sum of squares
Degrees
Mean
F
p-value
of freedom
square
n
P
2
SSM MSM
Regression SSM=
(b
yi y ) DFM=p 1 MSM= DFM
MSE 1
Source
i=1
Error
Total
3226/3252
SSE=
n
P
SST=
i=1
n
P
i=1
FDFM,DFE (F )
(yi ybi )
DFE=n p
SSE
MSE= DFE
SST
(yi y )2 DFT=n 1 MST= DFT


Matrix notation
Linear models


Appendix

Example regression output (=summary)

Error variance and standard deviation
s 2:
s:
MSE=
s2
Pn
2
i=1 i
np
CI s 2 :
CI s:
SSE
21/2 (np)
SSE
2/2 (np)
SSE
21/2 (np)
SSE
2/2 (np)
ANOVA
Source
Sum of squares
Regression SSM=
n
P
i=1
Error
Total
3227/3252
SSE=
n
P
SST=
i=1
n
P
i=1
Degrees
of freedom
Mean
square
SSM
(b
yi y )2 DFM=p 1 MSM= DFM
F
MSM
MSE
p-value
1
FDFM,DFE (F )
SSE
(yi ybi )2 DFE=n p MSE= DFE
SST
(yi y )2 DFT=n 1 MST= DFT

Example regression output (cont.) (=summary)

R 2: 1
Ra2 : 1
Coefficients:
b
>
X X
1
>
X y
R2
R:
p
Ra2
Ra :
SSE
SST
SSE/(np)
SST/(n1)
q se()
b kk
Cov ()
t
b
b
se()
p-value
CI()
b
1 tnp (|t|) b t1/2 (n p) se()
b
b + t1/2 (n p) se()
Covariance matrix:

b = s 2 X> X 1
Cov()
3228/3252


Matrix notation
Linear models


Appendix

Exercise regression
Given is the following linear regression:
Yi = 0 + 1 x1i + 2 x2i + i
For
with 20 observations we have
P20our sample
2
i=1 (yi y ) = 53.82:
0.19 0.08 0.04

0.2
(X> X)1 = 0.08 0.11 0.03 b = 0.93
0.04 0.03 0.05
0.95
20
X
2i = 11.67
i=1
a. Question: What is the estimate of variance of the residual?

b. Question: What is the 95% CI for 1 ?
c. Question: What is the 95% CI for 1 2 ?
3229/3252
d. Question: Are X1 and X2 jointly significant?

Exercise regression
a. Solution: s 2 =
P20
2
i=1 i /(n
p) = 11.67/17 = 0.69.
b. Solution:Var (b1 ) = s 2 c11 = 0.69 0.11 = 0.076

se(b1 ) = 0.076 = 0.276.
F&T page 163: t0.975 (17) = 2.110, thus 95% CI for 1 is;
(b1 t0.975 (17) se(b1 ), b1 + t0.975 (17) se(b1 )) = (0.35, 1.51)
b = s 2 D(X> X)1 D> is:
c. Solution: D = [0 1 1]; Var (D)
0.19 0.08 0.04

0
b =0.69 [0 1 1] 0.08 0.11 0.03 1
Var (D)
0.04 0.03 0.05
1
0
=0.69 [0.04 0.14 0.08] 1 = 0.69 0.22 = 0.151.
1
3230/3252

Exercise regression
b =
c. Solution (cont.): se(D)
b = 0.151 = 0.389.
Var (D )
F&T page 163: t0.975 (17) = 2.110, thus 95% CI for 1 2 is;
b b1 b2 + t0.975 (17) se(D ))
b
(b1 b2 t0.975 (17) se(D ),
= (0.84,0.80)
d. Solution: SST=53.82; SSE=11.67; SSM=42.14;
MSM=42.14/2=21.07; MSE=11.67/17=0.687;
F=21.07/0.687=30.68.
F0.01 (2, 17) = 6.112, thus X1 and X2 are jointly significant
even for = 0.01.
3231/3252


Matrix notation
Linear models


Appendix

We use a dataset consisting of salaries of football players and some

regressor variables that may influence their salaries:
1. SALARY = players salary;
2. DRAFT = the round in which player was originally drafted;
3. YRSEXP = the players experience in years;
4. PLAYED = the number of games played in the previous year;
3232/3252


Regressor variables (cont.):
5. STARTED = the number of games started in the previous
year;
6. CITYPOP = the population of the city in which the player is
domiciled;
7. OFFBACK = an indicator of players position in the game
(takes value 1 = offback defensive, 0 = others), i.e., it is a
dummy variable.
3233/3252

Summary Statistics of Variables in the Football Players Salary Data

Count
Mean
Median Std Dev Minimum Maximum
SALARY
169
336809
265000
255118
75000
1500000
DRAFT
169
6.473
5
4.61
1
13
YRSEXP
169
4.077
4
3.352
0
17
PLAYED
169
10.237
14
6.999
0
16
STARTED
169
5.97
1
6.859
0
16
CITYPOP
169
4980435 2421000 5098109 1176000 18120000
OFFBACK
169
0.2367
0
0.4263
0
1
3234/3252

SALARY
DRAFT
YRSEXP
PLAYED
STARTED
CITYPOP
OFFBACK
3235/3252
SALARY
DRAFT
-0.454
0.345
0.212
0.440
0.077
0.179
-0.059
-0.108
-0.253
0.126
-0.209
The Correlation Matrix

YRSEXP PLAYED STARTED
0.646
0.557
0.129
-0.050
0.633
0.193
-0.043
0.178
-0.081
CITYPOP
-0.067
OFFBACK

3236/3252

3237/3252

Source
Regression
Error
Total
3238/3252
Degree of
freedom
p1
np
n1
ANOVA Table
Sum of
Mean
Squares
Squares
SSM
MSM=SSM/p 1
SSE
MSE=SSE/n p
SST
MST=SST/n 1
F-Ratio
Prob(> F)
MSM/MSE
p-value


From this ANOVA table, we can derive several statistics that can
be used to summarise the quality of the regression model. For
example:
- The coefficient of determination is defined by:
R2 =
SSM
SST
and has the interpretation that it gives the proportion of the

total variability that is explained by the regression equation.
3239/3252


- The adjusted coefficient of is defined by:
Ra2 = 1
SSE/ (n p)
s2
=1 2
SST/ (n 1)
Sy
and has the same interpretation as the R-squared, except that

this is adjusted for the number of regressor variables.
In multiple regression, the R-squared increases as the number
of variables increases, but not necessarily so for adjusted
R-squared.
It increases only if an influential variable is added.
3240/3252

- The size of a typical error, denoted by s, is the square root of

s 2 and is also the square root of the error mean square:
s
SSE
.
s = s 2 = MSE =
np
It gives the average deviation of the actual y against that
predicted by the regression equation.
3241/3252

- The F -ratio defined by:

F -ratio =
MSM
,
MSE
is the test statistic used for model adequacy.

It provides another indication of how good the model is.
Its corresponding p-value should be as small as possible.
3242/3252


Summary of the results of the regression of the players salaries
against the regressor variables:
Regression Analysis
The regression equation is
SALARY = 361663 - 19139 DRAFT + 21301 YRSEXP - 7948 PLAYED
+ 12965 STARTED - 0.00070 CITYPOP + 82941 OFFBACK
Predictor
Constant
DRAFT
YRSEXP
PLAYED
STARTED
CITYPOP
OFFBACK
Coef
361663
-19139
21301
-7948
12965
-0.000699
82941
SE Coef
43734
3674
6370
3281
3189
0.003176
38241
T
8.17
-5.21
3.34
-2.42
4.07
-0.22
2.17
p
0.000
0.000
0.001
0.017
0.000
0.826
0.032
S = 203817 R-sq = 38.5% R-sq(adj) = 36.2%
3243/3252

ANOVA Table:
Analysis of
SOURCE
Regression
Error
Total
3244/3252
Variance
DF
SS
6
4.20463E+12
162 6.72970E+12
168 1.09343E+13
MS
7.00772E+11
41541379329
F
16.87
p
0.000

3245/3252

3246/3252


Improving the Regression Model
Here we give you summary of the results of the improved
regression model:
Regression Analysis
The regression equation is
LOGSAL = 11.8 + 0.0733 YRSEXP - 0.00981 PLAYED + 0.0264 STARTED
+ 0.000000 CITYPOP + 0.187 OFFBACK + 0.933 1/DRAFT
Predictor
Constant
YRSEXP
PLAYED
STARTED
CITYPOP
OFFBACK
1/DRAFT
Coef
11.7509
0.07332
-0.009815
0.026380
0.00000001
0.18741
0.9334
SE Coef
0.0814
0.01471
0.007607
0.007596
0.00000001
0.08691
0.1242
T
144.42
4.98
-1.29
3.47
0.70
2.16
7.52
S = 0.4713 R-sq = 54.6% R-sq(adj) = 52.9%

3247/3252
p
0.000
0.000
0.199
0.001
0.482
0.033
0.000

New ANOVA Table:

Analysis of
SOURCE
Regression
Error
Total
3248/3252
Variance
DF
SS
6
43.3145
162 35.9891
168 79.3035
MS
7.2191
0.2222
F
32.50
p
0.000

3249/3252

3250/3252

Appendix

Matrix notation
Linear models


Appendix

Appendix
For simple linear regression
y1
y2
y = .
..
in matrix form we have:
1 x1
1 x2
X = .. .. .
. .
yn
1 xn
Hence
>
X X=
and

1
X> X
=
i=1 xi

Pn
Pni=1 x2i
i=1 xi
Pn
2
i=1 xi
P
Pn
P
n
n i=1 xi2 ( ni=1 xi )2 i=1 xi
|
{z
}
P
1
=n
3251/3252
Pnn
n
2
i=1 (xi x)
Pn
i=1 xi

.

Appendix
Thus:
Pn

xi
i=1
P
X y=
.
n
i=1 xi yi
>
Hence
"
b =
b0
b1

1

= X> X
X> y
1
= Pn
n i=1 (xi x)2
3252/3252
Pn
P
P
Pn
xi2 ni=1 yi P ni=1 xP
i
i=1P
i=1 xi yi
n
n
n
n i=1 xi yi i=1 xi i=1 yi

Week11 Annotated

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Week11 Annotated

Hochgeladen von

Copyright:

Verfügbare Formate

ACTL2002/ACTL5101 Probability and Statistics: Week 11

ACTL2002/ACTL5101 Probability and Statistics

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Last ten weeks

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Final two weeks

Multiple linear regression:

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Multiple Linear regression

Statistical Properties of the Least Squares Estimates

Example: Multiple Linear Regression

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Linear Algebra and Matrix Approach

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Multiple Regression: Linear Algebra and Matrix Approach

1 x11 x12 . . . x1,p1

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Multiple Regression: Linear Algebra and Matrix Approach

ACTL2002/ACTL5101 Probability and Statistics: Week 11

The least squares problem is to find the vector that minimizes:

(yi 0 1 xi1 . . . p1 xip1 )2

Derivation of least squares estimator:

ACTL2002/ACTL5101 Probability and Statistics: Week 11

The Least Squares Estimates

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Multiple Linear regression

Statistical Properties of the Least Squares Estimates

Example: Multiple Linear Regression

ACTL2002/ACTL5101 Probability and Statistics: Week 11

The Model in Matrix Form

ACTL2002/ACTL5101 Probability and Statistics: Week 11

The Model in Matrix Form

The fitted value is:

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Multiple Linear regression

Statistical Properties of the Least Squares Estimates

Example: Multiple Linear Regression

ACTL2002/ACTL5101 Probability and Statistics: Week 11

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Linear models in general

For all linear forms we have: Yi = i + i .

ACTL2002/ACTL5101 Probability and Statistics: Week 11

can be re-written in matrix-vector form as:

So model has general form = X, i.e., the expected value

ACTL2002/ACTL5101 Probability and Statistics: Week 11

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Models in which data are divided into different groups, each

ACTL2002/ACTL5101 Probability and Statistics: Week 11

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Multiple Linear regression

Statistical Properties of the Least Squares Estimates

Example: Multiple Linear Regression

ACTL2002/ACTL5101 Probability and Statistics: Week 11

In words, the residuals have zero means, common variance,

where In is a matrix of size n n with ones on the diagonal

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Statistical Properties of the Least Squares Estimates

and b and S 2 are independent.

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Statistical Properties of the Least Squares Estimates

where ckk is the (k + 1)th diagonal entry of the matrix

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Multiple Linear regression

Statistical Properties of the Least Squares Estimates

For all linear forms we have: Yi = i + i .