Sie sind auf Seite 1von 25

# REGRESSION ANALYSIS

M. A. BOATENG

REGRESSION

## In many establishments, managers make decisions by studying the

relationship between variables.
Also, process improvement can often be made by understanding how
changes in one or more variables affect the process output.
Regression analysis is a statistical technique in which we use observed
data to relate a variable of interest (dependent or response variable)
to one or more independent (predictor) variables.

M. A. BOATENG

## SIMPLE LINEAR REGRESSION

The simple linear regression model is given by;
= 0 + 1 +
Where
is the value of the response variable in the ith observation
is the value of the predictor variable in the ith observation
is the random error term
0 is the intercept on the y-axis
1 measures the slope of the linear model

M. A. BOATENG

## ASSUMPTIONS OF THE MODEL

The values of are randomly and independently distributed.
The mean of the error term is zero, i.e. = 0.
The variance of the error term is constant, 2 ,i.e. V = 2 .

## The random errors are independently distributed.

M. A. BOATENG

For a simple linear regression, using the method of least squares, the
estimates of the parameters, 0 and 1 are given as;

0 = 1
1 =

=1

OR

1 =

=1

=1

=1

=1

=1
2

2
(

)
=1

M. A. BOATENG

COEFFICIENT OF DETERMINATION
The coefficient of determination, R-squared measures the proportion
of variation in the dependent variable that can be explained by the
independent variable.
2

Where

( )2

=
=1

=1(
M. A. BOATENG

)2
6

## ANALYSIS OF VARIANCE TABLE

SOURCE OF
VARIATION

DEGREES OF SUM OF
FREEDOM
SQUARES

MEAN SUM
OF SQUARES

Regression

SSR

=
1

Error or
Residual

n-2

SSE

=
2

Total

n-1

SST

M. A. BOATENG

F-RATIO

P-VALUE

( )

MULTIPLE REGRESSION
The general multiple linear regression model with response Y and
terms 1 , 2 , , will have the form;
= 0 + 1 1 + 2 2 + +

= 2
The symbol X in means that we are conditioning on all the
terms on the right side of the equation.
Both the and 2 are unknown parameters that need to be
estimated.
M. A. BOATENG

## Suppose we have observed data for n cases or units, meaning we have

a value for Y and all the terms for each of the n cases.
We have symbols for the response and the terms using matrices and
vectors;
We define ;

1
2
.
.
.

1
1
.
.
.
1

11
21
.
.
.
1

.
.
.

1
2
.
.
.

M. A. BOATENG

## is an 1 vector and is an ( + 1) matrix.

We also define to be a + 1 1 vector of regression coefficients
and to be the 1 vector of statistical errors,

0
1
.
.
.

and

1
2
.
.
.

M. A. BOATENG

10

The matrix gives all the observed values of the terms. The ith row of
will be defined by the symbol , which is a ( + 1) 1 vector for
mean functions that include an intercept.
Even though is a row of , we use the convention that all vectors are
column vectors and therefore need to write to represent a row.
An equation for the mean function evaluated at is:
= =
= 0 + 1 1 + 2 2 + +
Multiple linear regression model is written in matrix notation as:
= +
M. A. BOATENG

11

VARIANCE-COVARIANCE MATRIX OF
The assumptions concerning the errors, s are summarized in matrix
form as;
=
= 2

## Where is the covariance matrix of

is the matrix with ones on the diagonal and zeros
everywhere else
is a matrix or vector of zeros of appropriate size.
Adding the assumption of normality, we can write
~(, 2 )
M. A. BOATENG

12

## ORDINARY LEAST SQUARE ESTIMATORS

The least squares estimate of is chosen to minimize the residual
sum of squares function;
=

## The OLS estimate can be found by differentiating the above. The

estimate is given by the formula;
= ( )1
Provided that the inverse, ( )1 exists. The estimator depends
only on the sufficient statistics and

M. A. BOATENG

13

## To compute the estimates, the following corrected sums of squares and

cross products are used;
Suppose we define to be the matrix;
11 1
21 1
.
=
.
.
1 1

.
.
.

1
2
.
.
.

This matrix consists of the original matrix , but with the first column
removed and the column mean subtracted from each of the remaining
columns.
M. A. BOATENG

14

## Similarly, is the vector with typical elements . Then,

1

=
1
Is the matrix of sample variances and covariances.
If we let be the parameter vector excluding the intercept 0 , then
for 1,
= 1
0 =
Where is the vector of sample means for all the terms except the
intercept.
M. A. BOATENG

15

## Once is computed, we can define several related quantities. The

fitted values are;
=
and the residuals are ;
=
The residual sum of squares (RSS) is;

= =

M. A. BOATENG

16

## ANOVA FOR MULTIPLE REGRESSION

For multiple regression, the analysis of variance is a technique that is
used to compare mean functions that include nested sets of terms.
In the overall analysis of variance, the mean function of all the terms:
= = 0 + 1 1 + 2 2 + +

is compared with the mean function that includes only the intercept:
= = 0

M. A. BOATENG

17

ANOVA TABLE
SOURCE

df

REGRESSION

RESIDUAL
TOTAL

SS

MSS

n-(p+1)

=
( + 1)

n-1

P-VALUE

## We can judge the importance of the regression on the terms in the

larger model by determining if is sufficiently large by comparing
the ratio of the mean square of regression to 2 to the F-distribution to
get a significance level.
M. A. BOATENG

18

## If the computed significance level is small enough, then we would

judge that the mean function with all the parameters provides a
better fit than the mean function with only the intercept.
The hypotheses using the F-test is:

0 : = = 0
1 : = = 0 + 1 1 + 2 2 + +

M. A. BOATENG

19

## THE COEFFICIENT OF DETERMINATION

As with simple regression, the ratio

=
=1

M. A. BOATENG

20

## HYPOTHESIS CONCERNING ONE OF THE TERMS

Obtaining information on one of the terms may be of interest.
To perform this kind of test, first fit a mean function that excludes the
variable of interest and get the Residual Sum of Squares (RSS)
Then fit another mean function, this time including the variable of
interest and obtain its RSS as well.
Subtracting the RSS of the larger mean function from the smaller one
will give the SSReg for the variable of interest.

M. A. BOATENG

21

EXAMPLE:
Calculate the regression coefficients and write the regression model of
the data below;
X

12

66

38

70

22

27

28

47

14

68

14

35

22

29

15

17

20

12

29

M. A. BOATENG

22

## The value under R is the correlation coefficient for correlation

between the independent and dependent variable.
R-square is the proportion of variance (variability) in the dependent
variable accounted for by the independent variable. In this case,
82.5% of the variability in Y is accounted for by X.
The adjusted R square is a measure of model fit, adjusting for the
number of independent variables in the model.
M. A. BOATENG

23

The ANOVA table provides an F-test for the statistical model. If the Ftest is significant, then the model as a whole predicts significantly
more variability.
NB: This test is affected by the number of independent variables
M. A. BOATENG

24

## Each row corresponds to a single coefficient in the model

Constant, refers to the intercept
The regression model is thus:
= . + .
M. A. BOATENG

25