Sie sind auf Seite 1von 25

REGRESSION ANALYSIS

M. A. BOATENG

REGRESSION

In many establishments, managers make decisions by studying the


relationship between variables.
Also, process improvement can often be made by understanding how
changes in one or more variables affect the process output.
Regression analysis is a statistical technique in which we use observed
data to relate a variable of interest (dependent or response variable)
to one or more independent (predictor) variables.

M. A. BOATENG

SIMPLE LINEAR REGRESSION


The simple linear regression model is given by;
= 0 + 1 +
Where
is the value of the response variable in the ith observation
is the value of the predictor variable in the ith observation
is the random error term
0 is the intercept on the y-axis
1 measures the slope of the linear model

M. A. BOATENG

ASSUMPTIONS OF THE MODEL


The values of are randomly and independently distributed.
The mean of the error term is zero, i.e. = 0.
The variance of the error term is constant, 2 ,i.e. V = 2 .

The random errors are independently distributed.

M. A. BOATENG

For a simple linear regression, using the method of least squares, the
estimates of the parameters, 0 and 1 are given as;

0 = 1
1 =

=1

OR

1 =

=1

=1

=1

=1

=1
2

2
(

)
=1

M. A. BOATENG

COEFFICIENT OF DETERMINATION
The coefficient of determination, R-squared measures the proportion
of variation in the dependent variable that can be explained by the
independent variable.
2

Where

( )2

=
=1

=1(
M. A. BOATENG

)2
6

ANALYSIS OF VARIANCE TABLE


SOURCE OF
VARIATION

DEGREES OF SUM OF
FREEDOM
SQUARES

MEAN SUM
OF SQUARES

Regression

SSR

=
1

Error or
Residual

n-2

SSE

=
2

Total

n-1

SST

M. A. BOATENG

F-RATIO

P-VALUE

( )

MULTIPLE REGRESSION
The general multiple linear regression model with response Y and
terms 1 , 2 , , will have the form;
= 0 + 1 1 + 2 2 + +

= 2
The symbol X in means that we are conditioning on all the
terms on the right side of the equation.
Both the and 2 are unknown parameters that need to be
estimated.
M. A. BOATENG

Suppose we have observed data for n cases or units, meaning we have


a value for Y and all the terms for each of the n cases.
We have symbols for the response and the terms using matrices and
vectors;
We define ;

1
2
.
.
.

1
1
.
.
.
1

11
21
.
.
.
1

.
.
.

1
2
.
.
.

M. A. BOATENG

is an 1 vector and is an ( + 1) matrix.


We also define to be a + 1 1 vector of regression coefficients
and to be the 1 vector of statistical errors,

0
1
.
.
.

and

1
2
.
.
.

M. A. BOATENG

10

The matrix gives all the observed values of the terms. The ith row of
will be defined by the symbol , which is a ( + 1) 1 vector for
mean functions that include an intercept.
Even though is a row of , we use the convention that all vectors are
column vectors and therefore need to write to represent a row.
An equation for the mean function evaluated at is:
= =
= 0 + 1 1 + 2 2 + +
Multiple linear regression model is written in matrix notation as:
= +
M. A. BOATENG

11

VARIANCE-COVARIANCE MATRIX OF
The assumptions concerning the errors, s are summarized in matrix
form as;
=
= 2

Where is the covariance matrix of


is the matrix with ones on the diagonal and zeros
everywhere else
is a matrix or vector of zeros of appropriate size.
Adding the assumption of normality, we can write
~(, 2 )
M. A. BOATENG

12

ORDINARY LEAST SQUARE ESTIMATORS


The least squares estimate of is chosen to minimize the residual
sum of squares function;
=

The OLS estimate can be found by differentiating the above. The


estimate is given by the formula;
= ( )1
Provided that the inverse, ( )1 exists. The estimator depends
only on the sufficient statistics and

NB: The above are uncorrected sums of squares and cross-products.


M. A. BOATENG

13

To compute the estimates, the following corrected sums of squares and


cross products are used;
Suppose we define to be the matrix;
11 1
21 1
.
=
.
.
1 1

.
.
.

1
2
.
.
.

This matrix consists of the original matrix , but with the first column
removed and the column mean subtracted from each of the remaining
columns.
M. A. BOATENG

14

Similarly, is the vector with typical elements . Then,


1

=
1
Is the matrix of sample variances and covariances.
If we let be the parameter vector excluding the intercept 0 , then
for 1,
= 1
0 =
Where is the vector of sample means for all the terms except the
intercept.
M. A. BOATENG

15

Once is computed, we can define several related quantities. The


fitted values are;
=
and the residuals are ;
=
The residual sum of squares (RSS) is;

= =

M. A. BOATENG

16

ANOVA FOR MULTIPLE REGRESSION


For multiple regression, the analysis of variance is a technique that is
used to compare mean functions that include nested sets of terms.
In the overall analysis of variance, the mean function of all the terms:
= = 0 + 1 1 + 2 2 + +

is compared with the mean function that includes only the intercept:
= = 0

M. A. BOATENG

17

ANOVA TABLE
SOURCE

df

REGRESSION

RESIDUAL
TOTAL

SS

MSS

n-(p+1)

=
( + 1)

n-1

P-VALUE

We can judge the importance of the regression on the terms in the


larger model by determining if is sufficiently large by comparing
the ratio of the mean square of regression to 2 to the F-distribution to
get a significance level.
M. A. BOATENG

18

If the computed significance level is small enough, then we would


judge that the mean function with all the parameters provides a
better fit than the mean function with only the intercept.
The hypotheses using the F-test is:

0 : = = 0
1 : = = 0 + 1 1 + 2 2 + +

M. A. BOATENG

19

THE COEFFICIENT OF DETERMINATION


As with simple regression, the ratio

=
=1

is the proportion of variability in Y explained by regression on the terms

M. A. BOATENG

20

HYPOTHESIS CONCERNING ONE OF THE TERMS


Obtaining information on one of the terms may be of interest.
To perform this kind of test, first fit a mean function that excludes the
variable of interest and get the Residual Sum of Squares (RSS)
Then fit another mean function, this time including the variable of
interest and obtain its RSS as well.
Subtracting the RSS of the larger mean function from the smaller one
will give the SSReg for the variable of interest.

M. A. BOATENG

21

EXAMPLE:
Calculate the regression coefficients and write the regression model of
the data below;
X

12

66

38

70

22

27

28

47

14

68

14

35

22

29

15

17

20

12

29

ANSWER:

M. A. BOATENG

22

The value under R is the correlation coefficient for correlation


between the independent and dependent variable.
R-square is the proportion of variance (variability) in the dependent
variable accounted for by the independent variable. In this case,
82.5% of the variability in Y is accounted for by X.
The adjusted R square is a measure of model fit, adjusting for the
number of independent variables in the model.
M. A. BOATENG

23

The ANOVA table provides an F-test for the statistical model. If the Ftest is significant, then the model as a whole predicts significantly
more variability.
NB: This test is affected by the number of independent variables
M. A. BOATENG

24

Each row corresponds to a single coefficient in the model


Constant, refers to the intercept
The regression model is thus:
= . + .
M. A. BOATENG

25