Sie sind auf Seite 1von 63

ACTL2002/ACTL5101 Probability and Statistics: Week 11

ACTL2002/ACTL5101 Probability and Statistics


c Katja Ignatieva

School of Risk and Actuarial Studies
Australian School of Business
University of New South Wales
k.ignatieva@unsw.edu.au

Week 11
Week 2
Week 3
Week 4
Probability:
Week 6
Review
Estimation: Week 5
Week
7
Week
8
Week 9
Hypothesis testing:
Week
10
Week
12
Linear regression:
Week 2 VL
Week 3 VL
Week 4 VL
Video lectures: Week 1 VL
Week 1

Week 5 VL

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Last ten weeks


Introduction to probability;
Moments: (non)-central moments, mean, variance (standard
deviation), skewness & kurtosis;
Special univariate (parametric) distributions (discrete &
continue);
Joint distributions;
Convergence; with applications LLN & CLT;
Estimators (MME, MLE, and Bayesian);
Evaluation of estimators;
Interval estimation.
3201/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Final two weeks


Simple linear regression:
-

Idea;
Estimating using LSE (& BLUE estimator & relation MLE);
Partition of variability of the variable;
Testing:
i) Slope;
ii) Intercept;
iii) Regression line;
iv) Correlation coefficient.

Multiple linear regression:


3202/3252

Matrix notation;
LSE estimates;
Tests;
R-squared and adjusted R-squared.

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Matrix notation
Linear Algebra and Matrix Approach

Multiple Linear regression


Matrix notation
Linear Algebra and Matrix Approach
The Model in Matrix Form
Linear models

Statistical Properties of the Least Squares Estimates


Statistical Properties of the Least Squares Estimates
CI and Tests for Individual Regression Parameters
CI and Tests for functions of Regression Parameters

Example: Multiple Linear Regression


Example regression output
Exercise: Multiple Linear Regression
Example: Multiple Linear Regression

Appendix
Simple linear regression in matrix form

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Matrix notation
Linear Algebra and Matrix Approach

Linear Algebra and Matrix Approach


In general we will consider multiple regression problem:
y = 0 + 1 x 1 + 2 x 2 + . . . + p1 x p1
and data points:
y1 x11 x12 . . . x1,p1
y2 x21 x22 . . . x2,p1
..
..
.. . .
..
.
.
.
.
.
yn xn1 xn2 . . . xn,p1

3203/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Matrix notation
Linear Algebra and Matrix Approach

Multiple Regression: Linear Algebra and Matrix Approach


Observations yi are written in a vector y .
Regression coefficients are the vector (p by 1)
= [0 , 1 , . . . , p1 ]> where > indicates transpose ( a
column vector).
The matrix X (size n by p) is:

1 x11 x12 . . . x1,p1


1 x21 x22 . . . x2,p1

X= .
..
.. . .
..
..
.
.
.
.
1 xn1 xn2 . . . xn,p1
Predicted values are:
yb = X.
3204/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Matrix notation
Linear Algebra and Matrix Approach

Multiple Regression: Linear Algebra and Matrix Approach


Least squares problem is to select to minimize:

>

S = y X
y X .
Proof: see next slides.
Differentiate with respect to each of the 0 s and the normal
equations become:
b = X> y .
X> X
If X> X is non-singular then the parameter estimates are:

1
b = X> X

X> y .
The residuals are:
b
 = y yb = y X.
3205/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Matrix notation
Linear Algebra and Matrix Approach

The least squares problem is to find the vector that minimizes:


S

n
X

2i

i=1

n
X

n
X

(yi ybi )2

i=1

(yi 0 1 xi1 . . . p1 xip1 )2

i=1

y X

>


y X .

Derivation of least squares estimator:





>
>


>
>
> >
y X
y X =
0=
y y 2 X y
+ X X


>
= 2X> y + X> X + X> X
= 2X> y + 2X> X

1
X> y =X> X = X> X
X> y .
3206/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Matrix notation
Linear Algebra and Matrix Approach

The Least Squares Estimates


Differentiating this matrix w.r.t. and equating equal to zero
leads:
X> X = X> Y ,
1
i.e., the normal equations. If X> X
exists, the solution is:

1
b = X> X

X> Y .
The corresponding vector of fitted (or predicted) values of y is:
b = X
b
Y
and the vector of residuals:
b = Y X
b
b = Y Y
gives the differences between the observed and fitted values.

3207/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Matrix notation
The Model in Matrix Form

Multiple Linear regression


Matrix notation
Linear Algebra and Matrix Approach
The Model in Matrix Form
Linear models

Statistical Properties of the Least Squares Estimates


Statistical Properties of the Least Squares Estimates
CI and Tests for Individual Regression Parameters
CI and Tests for functions of Regression Parameters

Example: Multiple Linear Regression


Example regression output
Exercise: Multiple Linear Regression
Example: Multiple Linear Regression

Appendix
Simple linear regression in matrix form

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Matrix notation
The Model in Matrix Form

The Model in Matrix Form


Consider the regression model of the form:
y = 0 + 1 x1 + . . . + p1 xp1 + .
Fitted to data, the model becomes:
yi = 0 + 1 xi1 + . . . + p1 xip1 + i ,
Define the vectors:

y1
0
y2
1

Y = . , = .
.

..
.
[n1]
[p1]
yn
p1
3208/3252

for i = 1, 2, . . . , n.

, and

[n1]


1
2
..
.
n

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Matrix notation
The Model in Matrix Form

The Model in Matrix Form


Together with the matrix:

1 x11 . . . x1,p1
1 x21 . . . x2,p1

X = .
.. . .
..
..
[np]
.
.
.
1 xn1 . . . xn,p1
Write the model in matrix form as follows:
+  .

Y = X
n1

[np]

[p1]

[n1]

The fitted value is:


b = X
Y
[n1]
3209/3252

[np]

b .

[p1]

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Matrix notation
Linear models

Multiple Linear regression


Matrix notation
Linear Algebra and Matrix Approach
The Model in Matrix Form
Linear models

Statistical Properties of the Least Squares Estimates


Statistical Properties of the Least Squares Estimates
CI and Tests for Individual Regression Parameters
CI and Tests for functions of Regression Parameters

Example: Multiple Linear Regression


Example regression output
Exercise: Multiple Linear Regression
Example: Multiple Linear Regression

Appendix
Simple linear regression in matrix form

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Matrix notation
Linear models

Introduction
To apply linear regression properly:
Effects of the covariates (explanatory variables) must be
additive;
Homoskedastic (constant) variance (otherwise use
AutoRegressive Conditional Heteroscedasticity model (ARCH)
model, from Robert Engle; 2003 Nobel prize for Economics);
Errors must be independent of the explanatory variables with
mean zero (weak assumptions);
Errors must be Normally distributed, and hence, symmetric
(only in case of testing, i.e., strong assumptions).
3210/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Matrix notation
Linear models

Linear models in general


A linear model involves a response variable datum, yi , treated
as an observation on a random variable, (Yi |X = x), where
E[Yi |X = x] i , the i s are zero mean random variables
independent of X , and the i s are model parameters, the
values of which are unknown and need to be estimated using
data.
The following are examples of linear models:
- Affine form: i = 0 + xi 1 ;
- Polynomial (cubic) form: i = 0 + xi 1 + xi2 2 + xi3 3 ;
- Affine form with interaction terms:
i = 0 + xi 1 + zi 2 + (xi zi )3 .

For all linear forms we have: Yi = i + i .


3211/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Matrix notation
Linear models

Linear models
The first model

1
2

..
.

can be re-written in matrix-vector form as:


1 x1
1 x2 



1 x3 0
= [1n X ] .

=
.. .. 1
. .
1 xn
n
| {z }
X

So model has general form = X, i.e., the expected value


vector is given by a model matrix (or design matrix), X,
multiplied by a parameter vector, .
All linear models can be written in this general form.
3212/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Matrix notation
Linear models

Linear models
The second model (the cubic) given above can
matrix-vector form as:

1
1 x1 x12 x13
2 1 x 2 x 2 x 3
2
2

3 1 x 3 x 2 x 3
3
3
=

.. .. ..
..
..
. . .
.
.
2
n
1 xn xn xn3
{z
}
|
X

3213/3252

be written in

0
1
.
2
3

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Matrix notation
Linear models

Models in which data are divided into different groups, each


of which are assumed to have a different mean, are less
obviously of the form = X, but they can be written like
this using dummy variables.
Consider the model:
yi = j + i if observation i is in group j,
and suppose there are three groups, each with two data. Then
the model can be re-written:

y1
1 0 0
y2 1 0 0

y3 0 1 0 0

y4 = 0 1 0 1 + .


2
y5 0 0 1
y6
|
3214/3252

0 0 1
{z
}
X

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Matrix notation
Linear models

Marginal effects
Assume that we have the multiple regression model of the
form:
y = 0 + 1 x1 + . . . + p1 xp1 + .
Assume that xk is a continuous variable so that if we increase
it by one unit while holding the values of the other variables
fixed, the value of y becomes:
y new = 0 + 1 x1 + . . . + k (xk +1) + . . . + p1 xp1 + .
Since E [] = 0, then the marginal effect of xk is:
k = E [y new ] E [y ] ,
is therefore the expected increase (or decrease) in the value of
y whenever you increase the value of xk by one unit.
3215/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Statistical Properties of the Least Squares Estimates
Statistical Properties of the Least Squares Estimates

Multiple Linear regression


Matrix notation
Linear Algebra and Matrix Approach
The Model in Matrix Form
Linear models

Statistical Properties of the Least Squares Estimates


Statistical Properties of the Least Squares Estimates
CI and Tests for Individual Regression Parameters
CI and Tests for functions of Regression Parameters

Example: Multiple Linear Regression


Example regression output
Exercise: Multiple Linear Regression
Example: Multiple Linear Regression

Appendix
Simple linear regression in matrix form

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Statistical Properties of the Least Squares Estimates
Statistical Properties of the Least Squares Estimates

Assumptions
The residuals terms i satisfy the following:
E [i |X = x] = 0,
Var (i |X = x) = 2 ,
Cov (i , j |X = x) = 0,

for i = 1, 2, . . . , n;
for i = 1, 2, . . . , n;
for all i =
6 j.

In words, the residuals have zero means, common variance,


are uncorrelated with explanatory variables and are
independent of other residuals.
In matrix form, we have:
E [] = 0;
Cov () = 2 In ,

3216/3252

where In is a matrix of size n n with ones on the diagonal


and zeros on the off-diagonal elements.

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Statistical Properties of the Least Squares Estimates
Statistical Properties of the Least Squares Estimates

Statistical Properties of the Least Squares Estimates


The following properties of the least squares estimates can be
verified:
h i
b = .
1. The least squares estimates are unbiased: E
2. The variance-covariance
matrix of the least squares estimates
 
1
2
>
b =
is: Var
b X X .
3. An unbiased estimate of 2 is:
>

1
s2 =
y yb
y yb .
np
Note that:
(n p) S 2
2 (n p),
2
3217/3252

and b and S 2 are independent.

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Statistical Properties of the Least Squares Estimates
Statistical Properties of the Least Squares Estimates

Statistical Properties of the Least Squares Estimates


4. Each component bk is normally distributed with mean:
h i
E bk = k ,
and variance:

 
b2 ckk ,
Var bk =

where ckk is the (k + 1)th diagonal entry of the matrix


1
C = X> X
(because c11 corresponds to the constant) and
covariance between bk and bl :


Cov bk , bl =
b2 ckl ,
3218/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Statistical Properties of the Least Squares Estimates
CI and Tests for Individual Regression Parameters

Multiple Linear regression


Matrix notation
Linear Algebra and Matrix Approach
The Model in Matrix Form
Linear models

Statistical Properties of the Least Squares Estimates


Statistical Properties of the Least Squares Estimates
CI and Tests for Individual Regression Parameters
CI and Tests for functions of Regression Parameters

Example: Multiple Linear Regression


Example regression output
Exercise: Multiple Linear Regression
Example: Multiple Linear Regression

Appendix
Simple linear regression in matrix form

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Statistical Properties of the Least Squares Estimates
CI and Tests for Individual Regression Parameters

CI and Tests for Individual Regression Parameters


The standard error of bk is estimated using:
 

se bk = s ckk .
Under the normality (strong) assumption, we have:
bk k
  t (n p) .
se bk
A 100 (1 ) % confidence interval for k is given by:
 
bk t1/2,np se bk .
3219/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Statistical Properties of the Least Squares Estimates
CI and Tests for Individual Regression Parameters

CI and Tests for Individual Regression Parameters


In testing the null hypothesis H0 : k = k0 for some fixed constant
k0 , we use the test statistic:
T =

bk k0
 
se bk

which under the null hypothesis, it has a t-distribution with n p


degrees of freedom. The common test is to test the significance of
the presence of the variable xk , in which case the test statistic
simply becomes:
bk
T =  ,
se bk
because we test H0 : k = 0 against H1 : k 6= 0 when we test for
the significance/importance of the variable.
3220/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Statistical Properties of the Least Squares Estimates
CI and Tests for Individual Regression Parameters

CI and Tests for Individual Regression Parameters


However, we can always have more general tests for the regression
coefficients as demonstrated in the three cases below:
1. Test the null hypothesis:
H0 : k = k0
against the alternative:
H1 : k 6= k0 .
Use the decision rule (using generalized LRT, week 7):




b
k k0

Reject H0 if:
|T | =   > t1/2,np .
se bk
3221/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Statistical Properties of the Least Squares Estimates
CI and Tests for Individual Regression Parameters

CI and Tests for Individual Regression Parameters


2. Test the hypothesis:
H0 : k = k0

v.s.

H1 : k > k0 .

Use the decision rule (using UMP, week 7):


Reject H0 if:

T =

bk k0
  > t1,np .
se bk

3. Test the hypothesis:


H0 : k = k0

v.s.

H1 : k < k0 .

Use the decision rule (using UMP, week 7):


Reject H0 if:
3222/3252

T =

bk k0
  < t1,np .
se bk

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Statistical Properties of the Least Squares Estimates
CI and Tests for functions of Regression Parameters

Multiple Linear regression


Matrix notation
Linear Algebra and Matrix Approach
The Model in Matrix Form
Linear models

Statistical Properties of the Least Squares Estimates


Statistical Properties of the Least Squares Estimates
CI and Tests for Individual Regression Parameters
CI and Tests for functions of Regression Parameters

Example: Multiple Linear Regression


Example regression output
Exercise: Multiple Linear Regression
Example: Multiple Linear Regression

Appendix
Simple linear regression in matrix form

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Statistical Properties of the Least Squares Estimates
CI and Tests for functions of Regression Parameters

CI and Tests for functions of Regression Parameters


Let D be a matrix (size m p) of m linear combinations of the
explanatory variables.
Then we have that:
h i
E Db =D
 
 
Var Db =DVar b D> = 2 D(X> X)1 D>
Under the normality (strong) assumption, we have:
D(b )
p
t (n p) .
s 2 D(X> X)1 D>
|
{z
}
=se (Db)
A 100 (1 ) % confidence interval for D is given by:
 
Db t1/2,np se Db .

3223/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Statistical Properties of the Least Squares Estimates
CI and Tests for functions of Regression Parameters

Adjusted R-Squared
The coefficient of determination may is:
SSE
SST SSE
=1
.
R2 =
SST
SST
In the simple linear regression model, the R-squared provides a
descriptive measure of the success of the regressor variables in
explaining the variation in the dependent variable.
The R-squared will always increase when adding additional
regressor variables increase even if regressor variables added
do not strongly influence the dependent variable.
An alternative is to correct it for the number of regressor
variables present. Thus, we define adjusted R-squared:
Ra2 = 1
3224/3252


SSE/ (n p)
s2
n1
=1
=1
1 R2 .
SST/ (n 1)
MST
np

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Statistical Properties of the Least Squares Estimates
CI and Tests for functions of Regression Parameters

Can we test wether the regression explains anything


significant? E.g. can we jointly test wether
[1 , . . . , p1 ]> = 0 (note: excluding 0 )?
Use the F-statistic:
e |
b 2 /(p 1)
SSM/(p 1)
|X
=
Fp1,np .
F =
2
|| /(n p)
SSE/(n p)
e |
b 2 / 2 2 and
Under the strong assumptions |X
p1
2
2
2
e is the
|| / np are chi-squared distributed (note: X
matrix X without the constant).
Interpretation: If the regression model explains a large
proportion of the variability in y , then |X|2 should be large
and ||2 should be small.
Hence, test H0 : = 0 v.s. H1 : at least one k 6= 0.
3225/3252

Reject H0 if F > Fp1,np (1 ).

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Statistical Properties of the Least Squares Estimates
CI and Tests for functions of Regression Parameters

ANOVA table and sum of squares:


- SST is the total variability in the absence of knowledge of the
variables X1 , . . . , Xp1 ;
- SSE is the total variability remaining after introducing the
effect of X1 , . . . , Xp1 ;
- SSM is the total variability explained because of knowledge
of X1 , . . . , Xp1 .

This partitioning of the variability is used in ANOVA tables:


Sum of squares
Degrees
Mean
F
p-value
of freedom
square
n
P
2
SSM MSM
Regression SSM=
(b
yi y ) DFM=p 1 MSM= DFM
MSE 1
Source

i=1

Error
Total
3226/3252

SSE=

n
P

SST=

i=1
n
P
i=1

FDFM,DFE (F )
(yi ybi )

DFE=n p

SSE
MSE= DFE

SST
(yi y )2 DFT=n 1 MST= DFT

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example regression output

Multiple Linear regression


Matrix notation
Linear Algebra and Matrix Approach
The Model in Matrix Form
Linear models

Statistical Properties of the Least Squares Estimates


Statistical Properties of the Least Squares Estimates
CI and Tests for Individual Regression Parameters
CI and Tests for functions of Regression Parameters

Example: Multiple Linear Regression


Example regression output
Exercise: Multiple Linear Regression
Example: Multiple Linear Regression

Appendix
Simple linear regression in matrix form

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example regression output

Example regression output (=summary)


Error variance and standard deviation
s 2:
s:

MSE=

s2

Pn

2
i=1 i
np

CI s 2 :
CI s:

SSE
21/2 (np)

SSE
2/2 (np)

SSE
21/2 (np)

SSE
2/2 (np)

ANOVA
Source

Sum of squares

Regression SSM=

n
P
i=1

Error
Total
3227/3252

SSE=

n
P

SST=

i=1
n
P
i=1

Degrees
of freedom

Mean
square

SSM
(b
yi y )2 DFM=p 1 MSM= DFM

F
MSM
MSE

p-value
1
FDFM,DFE (F )

SSE
(yi ybi )2 DFE=n p MSE= DFE
SST
(yi y )2 DFT=n 1 MST= DFT

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example regression output

Example regression output (cont.) (=summary)


R 2: 1
Ra2 : 1
Coefficients:
b
>

X X

1

>

X y

R2
R:
p
Ra2
Ra :

SSE
SST
SSE/(np)
SST/(n1)

q se()
b kk
Cov ()

t
b
b
se()

p-value

CI()

b
1 tnp (|t|) b t1/2 (n p) se()
b
b + t1/2 (n p) se()

Covariance matrix:

b = s 2 X> X 1
Cov()

3228/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Exercise: Multiple Linear Regression

Multiple Linear regression


Matrix notation
Linear Algebra and Matrix Approach
The Model in Matrix Form
Linear models

Statistical Properties of the Least Squares Estimates


Statistical Properties of the Least Squares Estimates
CI and Tests for Individual Regression Parameters
CI and Tests for functions of Regression Parameters

Example: Multiple Linear Regression


Example regression output
Exercise: Multiple Linear Regression
Example: Multiple Linear Regression

Appendix
Simple linear regression in matrix form

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Exercise: Multiple Linear Regression

Exercise regression
Given is the following linear regression:
Yi = 0 + 1 x1i + 2 x2i + i
For
with 20 observations we have
P20our sample
2
i=1 (yi y ) = 53.82:

0.19 0.08 0.04


0.2
(X> X)1 = 0.08 0.11 0.03 b = 0.93
0.04 0.03 0.05
0.95

20
X

2i = 11.67

i=1

a. Question: What is the estimate of variance of the residual?


b. Question: What is the 95% CI for 1 ?
c. Question: What is the 95% CI for 1 2 ?
3229/3252

d. Question: Are X1 and X2 jointly significant?

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Exercise: Multiple Linear Regression

Exercise regression
a. Solution: s 2 =

P20

2
i=1 i /(n

p) = 11.67/17 = 0.69.

b. Solution:Var (b1 ) = s 2 c11 = 0.69 0.11 = 0.076


se(b1 ) = 0.076 = 0.276.
F&T page 163: t0.975 (17) = 2.110, thus 95% CI for 1 is;
(b1 t0.975 (17) se(b1 ), b1 + t0.975 (17) se(b1 )) = (0.35, 1.51)
b = s 2 D(X> X)1 D> is:
c. Solution: D = [0 1 1]; Var (D)

0.19 0.08 0.04


0
b =0.69 [0 1 1] 0.08 0.11 0.03 1
Var (D)
0.04 0.03 0.05
1

0
=0.69 [0.04 0.14 0.08] 1 = 0.69 0.22 = 0.151.
1
3230/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Exercise: Multiple Linear Regression

Exercise regression
b =
c. Solution (cont.): se(D)

b = 0.151 = 0.389.
Var (D )

F&T page 163: t0.975 (17) = 2.110, thus 95% CI for 1 2 is;
b b1 b2 + t0.975 (17) se(D ))
b
(b1 b2 t0.975 (17) se(D ),
= (0.84,0.80)
d. Solution: SST=53.82; SSE=11.67; SSM=42.14;
MSM=42.14/2=21.07; MSE=11.67/17=0.687;
F=21.07/0.687=30.68.
F0.01 (2, 17) = 6.112, thus X1 and X2 are jointly significant
even for = 0.01.
3231/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example: Multiple Linear Regression

Multiple Linear regression


Matrix notation
Linear Algebra and Matrix Approach
The Model in Matrix Form
Linear models

Statistical Properties of the Least Squares Estimates


Statistical Properties of the Least Squares Estimates
CI and Tests for Individual Regression Parameters
CI and Tests for functions of Regression Parameters

Example: Multiple Linear Regression


Example regression output
Exercise: Multiple Linear Regression
Example: Multiple Linear Regression

Appendix
Simple linear regression in matrix form

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example: Multiple Linear Regression

Example: Multiple Linear Regression

We use a dataset consisting of salaries of football players and some


regressor variables that may influence their salaries:
1. SALARY = players salary;
2. DRAFT = the round in which player was originally drafted;
3. YRSEXP = the players experience in years;
4. PLAYED = the number of games played in the previous year;

3232/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example: Multiple Linear Regression

Example: Multiple Linear Regression


Regressor variables (cont.):
5. STARTED = the number of games started in the previous
year;
6. CITYPOP = the population of the city in which the player is
domiciled;
7. OFFBACK = an indicator of players position in the game
(takes value 1 = offback defensive, 0 = others), i.e., it is a
dummy variable.

3233/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example: Multiple Linear Regression

Example: Multiple Linear Regression

Summary Statistics of Variables in the Football Players Salary Data


Count
Mean
Median Std Dev Minimum Maximum
SALARY
169
336809
265000
255118
75000
1500000
DRAFT
169
6.473
5
4.61
1
13
YRSEXP
169
4.077
4
3.352
0
17
PLAYED
169
10.237
14
6.999
0
16
STARTED
169
5.97
1
6.859
0
16
CITYPOP
169
4980435 2421000 5098109 1176000 18120000
OFFBACK
169
0.2367
0
0.4263
0
1

3234/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example: Multiple Linear Regression

Example: Multiple Linear Regression

SALARY
DRAFT
YRSEXP
PLAYED
STARTED
CITYPOP
OFFBACK

3235/3252

SALARY

DRAFT

-0.454
0.345
0.212
0.440
0.077
0.179

-0.059
-0.108
-0.253
0.126
-0.209

The Correlation Matrix


YRSEXP PLAYED STARTED

0.646
0.557
0.129
-0.050

0.633
0.193
-0.043

0.178
-0.081

CITYPOP

-0.067

OFFBACK

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example: Multiple Linear Regression

Example: Multiple Linear Regression

3236/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example: Multiple Linear Regression

Example: Multiple Linear Regression

3237/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example: Multiple Linear Regression

Example: Multiple Linear Regression

Source
Regression
Error
Total

3238/3252

Degree of
freedom
p1
np
n1

ANOVA Table
Sum of
Mean
Squares
Squares
SSM
MSM=SSM/p 1
SSE
MSE=SSE/n p
SST
MST=SST/n 1

F-Ratio

Prob(> F)

MSM/MSE

p-value

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example: Multiple Linear Regression

Example: Multiple Linear Regression


From this ANOVA table, we can derive several statistics that can
be used to summarise the quality of the regression model. For
example:
- The coefficient of determination is defined by:
R2 =

SSM
SST

and has the interpretation that it gives the proportion of the


total variability that is explained by the regression equation.

3239/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example: Multiple Linear Regression

Example: Multiple Linear Regression


- The adjusted coefficient of is defined by:
Ra2 = 1

SSE/ (n p)
s2
=1 2
SST/ (n 1)
Sy

and has the same interpretation as the R-squared, except that


this is adjusted for the number of regressor variables.
In multiple regression, the R-squared increases as the number
of variables increases, but not necessarily so for adjusted
R-squared.
It increases only if an influential variable is added.
3240/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example: Multiple Linear Regression

Example: Multiple Linear Regression

- The size of a typical error, denoted by s, is the square root of


s 2 and is also the square root of the error mean square:
s

SSE
.
s = s 2 = MSE =
np
It gives the average deviation of the actual y against that
predicted by the regression equation.

3241/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example: Multiple Linear Regression

Example: Multiple Linear Regression

- The F -ratio defined by:


F -ratio =

MSM
,
MSE

is the test statistic used for model adequacy.


It provides another indication of how good the model is.
Its corresponding p-value should be as small as possible.

3242/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example: Multiple Linear Regression

Example: Multiple Linear Regression


Summary of the results of the regression of the players salaries
against the regressor variables:
Regression Analysis
The regression equation is
SALARY = 361663 - 19139 DRAFT + 21301 YRSEXP - 7948 PLAYED
+ 12965 STARTED - 0.00070 CITYPOP + 82941 OFFBACK
Predictor
Constant
DRAFT
YRSEXP
PLAYED
STARTED
CITYPOP
OFFBACK

Coef
361663
-19139
21301
-7948
12965
-0.000699
82941

SE Coef
43734
3674
6370
3281
3189
0.003176
38241

T
8.17
-5.21
3.34
-2.42
4.07
-0.22
2.17

p
0.000
0.000
0.001
0.017
0.000
0.826
0.032

S = 203817 R-sq = 38.5% R-sq(adj) = 36.2%

3243/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example: Multiple Linear Regression

Example: Multiple Linear Regression

ANOVA Table:
Analysis of
SOURCE
Regression
Error
Total

3244/3252

Variance
DF
SS
6
4.20463E+12
162 6.72970E+12
168 1.09343E+13

MS
7.00772E+11
41541379329

F
16.87

p
0.000

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example: Multiple Linear Regression

Example: Multiple Linear Regression

3245/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example: Multiple Linear Regression

Example: Multiple Linear Regression

3246/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example: Multiple Linear Regression

Example: Multiple Linear Regression


Improving the Regression Model
Here we give you summary of the results of the improved
regression model:
Regression Analysis
The regression equation is
LOGSAL = 11.8 + 0.0733 YRSEXP - 0.00981 PLAYED + 0.0264 STARTED
+ 0.000000 CITYPOP + 0.187 OFFBACK + 0.933 1/DRAFT
Predictor
Constant
YRSEXP
PLAYED
STARTED
CITYPOP
OFFBACK
1/DRAFT

Coef
11.7509
0.07332
-0.009815
0.026380
0.00000001
0.18741
0.9334

SE Coef
0.0814
0.01471
0.007607
0.007596
0.00000001
0.08691
0.1242

T
144.42
4.98
-1.29
3.47
0.70
2.16
7.52

S = 0.4713 R-sq = 54.6% R-sq(adj) = 52.9%


3247/3252

p
0.000
0.000
0.199
0.001
0.482
0.033
0.000

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example: Multiple Linear Regression

Example: Multiple Linear Regression

New ANOVA Table:


Analysis of
SOURCE
Regression
Error
Total

3248/3252

Variance
DF
SS
6
43.3145
162 35.9891
168 79.3035

MS
7.2191
0.2222

F
32.50

p
0.000

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example: Multiple Linear Regression

Example: Multiple Linear Regression

3249/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Example: Multiple Linear Regression
Example: Multiple Linear Regression

Example: Multiple Linear Regression

3250/3252

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Appendix
Simple linear regression in matrix form

Multiple Linear regression


Matrix notation
Linear Algebra and Matrix Approach
The Model in Matrix Form
Linear models

Statistical Properties of the Least Squares Estimates


Statistical Properties of the Least Squares Estimates
CI and Tests for Individual Regression Parameters
CI and Tests for functions of Regression Parameters

Example: Multiple Linear Regression


Example regression output
Exercise: Multiple Linear Regression
Example: Multiple Linear Regression

Appendix
Simple linear regression in matrix form

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Appendix
Simple linear regression in matrix form

For simple linear regression

y1
y2

y = .
..

in matrix form we have:

1 x1
1 x2

X = .. .. .
. .

yn
1 xn

Hence
>

X X=
and

1
X> X
=

i=1 xi


Pn
Pni=1 x2i
i=1 xi

 Pn
2
i=1 xi
P
Pn
P
n
n i=1 xi2 ( ni=1 xi )2 i=1 xi
|
{z
}
P
1

=n

3251/3252

Pnn

n
2
i=1 (xi x)

Pn

i=1 xi


.

ACTL2002/ACTL5101 Probability and Statistics: Week 11


Appendix
Simple linear regression in matrix form

Thus:
 Pn

xi
i=1
P
X y=
.
n
i=1 xi yi
>

Hence
"
b =

b0
b1


1 

= X> X
X> y

1
= Pn
n i=1 (xi x)2

3252/3252

 Pn

P
P
Pn
xi2 ni=1 yi P ni=1 xP
i
i=1P
i=1 xi yi
n
n
n
n i=1 xi yi i=1 xi i=1 yi

Das könnte Ihnen auch gefallen