Sie sind auf Seite 1von 19

FICB 213 CORPORATE FINANCE

TUTORIAL GUIDE ON HOW TO RUN A SIMPLE LINEAR REGRESSION IN EXCEL

The objective of this tutorial is to demonstrate how to run a simple linear regression using Excel
2013 and interpret the regression results. This guideline is very useful for your group project
FICB 213 Corporate Finance.

WHAT IS A SIMPLE LINEAR REGRESSION?


A simple linear regression is used to examine how an independent variable X may affect a
dependent variable Y.
A simple linear regression is also called Ordinary Least Squares (OLS) regression in some
textbook.
The simple linear regression equation is written as Y = a + bX
Where:
Y = is the value of the Dependent variable (Y), what is being predicted or explained
X = is the value of the Independent variable (X), what is predicting or explaining the
value of Y
a

= Alpha, a constant; equals the value of Y when the value of X=0

b = Beta, the coefficient of X; the slope of the regression line; how much Y changes
for each one-unit change in X.
For example, if you want to examine how leverage affect dividend => then leverage is called the
independent variable X while dividend will be the dependent variable Y.

WHY YOU NEED A SIMPLE LINEAR REGRESSION?

The main reasons to run a simple regression between X and Y are as follows:
1. To find out the relationship between X and Y. Is there any significant relationship
between X and Y? If there is a relationship between X and Y, is it a positive or negative
relationship?
Whether there is a positive or a negative relationship between X and Y is given by the
value of b or beta.
For example if the equation result is Y = 0.2 + 0.5 X
Here, b= +0.5 ; this means the higher value of X then the higher value of Y will be; or X
has a positive relationship with Y.
But, on the other hand, if the equation result is Y = 0.2 - 0.5 X
Here b= -0.5 ; this means the higher value of X then the lower value of Y will be; or X
has a negative relationship with Y.
The relationship between variable X and Y is given by the value of the correlation
coefficient R. For example if the value or R = +0.70 and the p-value is lower than 0.05, it
means that the variable X is positively correlated to variable Y at 5 percent significant
level. However, if for example R = -0.70, it means that the variable X is negatively
correlated to variable Y at 5 percent significant level. If R close to zero, e.g R = 0.00001

2. To find out how much independent variable X influence the dependent variable Y. The
percentage of how much variation in X influence the variation of Y is given by the value
of R-squared or R2.. For example if the R2. = 0.20 then it means the variation in Y is
explained by only 20 percent of the variation in the variable X. The balance 80 percent
could be due to other factors or unexplained variables.
3. To predict value of Y when you know X. For example if the equation result is Y = 0.2 +
0.5 X. Then if you know X= 30 then putting into the equation we can predict the value of
Y = 0.2 + 0.5(30) = 15.2.

The example data in Table 1 are plotted in Figure 1. You can see roughly that there is a
positive relationship between X and Y.
If you were going to predict Y from X, we need to find the regression line of predicted Y.
Table 1. Example data.
X

1.00

1.00

2.00

2.00

3.00

1.30

4.00

3.75

5.00

2.25

Figure 1. A scatter plot of the example data.

Linear regression consists of finding the best-fitting straight line through the points. The
best-fitting line is called a regression line. The black diagonal line in Figure 2 is the regression

line and consists of the predicted score on Y for each possible value of X. The vertical lines from
the points to the regression line represent the errors of prediction. As you can see, the red point is
very near the regression line; its error of prediction is small. By contrast, the yellow point is
much higher than the regression line and therefore its error of prediction is large.

Regression
Line

Figure 2. A scatter plot of the example data.


The black line consists of the predictions, the points are the actual data, and the vertical lines
between the points and the black line represent errors of prediction.
The error of prediction for a point is the value of the point minus the predicted value (the value
on the line). Table 2 shows the predicted values (Y') and the errors of prediction (Y-Y'). For
example, the first point has a Y of 1.00 and a predicted Y (called Y') of 1.21. Therefore, its error
of prediction is -0.21.
Best-fitting line is the line that minimizes the sum of the squared errors of prediction. That is the
criterion that was used to find the line in Figure 2.

Table 2. Example data.


X

Y'

Y-Y'

(Y-Y')2

1.00

1.00

1.210

-0.210

0.044

2.00

2.00

1.635

0.365

0.133

3.00

1.30

2.060

-0.760

0.578

4.00

3.75

2.485

1.265

1.600

5.00

2.25

2.910

-0.660

0.436

The last column in Table 2 shows the squared errors of prediction. The sum of the squared errors
of prediction shown in Table 2 is lower than it would be for any other regression line.
The formula for a regression line is
Y' = a + bX
where Y' is the predicted score, b is the slope of the line, and A is the Y intercept. The equation
for the line in Figure 2 is
Y' = 0.785 + 0.425X
For X = 1,
Y' = 0.785 +(0.425)(1) = 1.21.
For X = 2,
Y' = 0.785 + (0.425)(2) = 1.64.

A REAL EXAMPLE
The case study "SAT and College GPA" was conducted in the U.S contains high school and
university grades for 105 computer science majors at a local state school. We now consider how
we could predict a student's university GPA if we knew his or her previous high school GPA.
Figure 3 shows a scatter plot of University GPA as a function of High School GPA. You can see
from the figure that there is a strong positive relationship. The correlation is 0.78. The regression
equation is
University GPA' = 1.097 + (0.675)(High School GPA)
Therefore, a student with a high school GPA of 3 would be predicted to have a university GPA of
University GPA' = 1.097 + (0.675)(3) = 3.12.

Figure 3. University GPA as a function of High School GPA.

HOW TO RUN A REGRESSION USING EXCEL 2013?

1. INSTALL A DATA ANALYSIS MENU INTO THE EXCEL.

To run a regression in Excel you need to install a Data Analysis menu into the Excel
spreadsheet. Instructions on how to install a Data Analysis menu is given in the Appendix.
If you open the Excel spreadsheet and click the DATA menu and if you see the Data Analysis
menu on the top right side like below printscreen, then you can start running the regression.

To run a regression, click Data Analysis and choose Regression and click OK as below:

The following screen below will appear when you click OK. Now you have to decide
what is Input Y Range and what is Input X Range. Y should be a dependent
variable and X should be the independent variable. If you want to find out how
Leverage (Debt Ratio) affect Dividend Policy then Leverage (Debt Ratio) should be
your X variable and Dividend Policy (you can choose whether to use DPS, Dividend
Payout or Dividend Yield) as Y variable. You may try all possible combinations
because it could happen that some of your regressions may produce nonsignificant results thus you cannot conclude anything. We can conclude something if
the regression results are significant at 5 percent level.

If you choose DPS as your Y range then click on the button and use mouse to
highlight the range of DPS data e.g. from D2 to D31 and click again the button.
Repeat the same procedure for X range which here you can choose Debt/Equity
Ratio as your X variable. Click tick on Line Fits Plots as it will show you the
regression line. Then click OK.

The following results will appear. Now what you need to do is to interpret the
regression results below.

Remember our X variable is Debt/Equity ratio and Y variable is Dividend Per Share.
The orange line in the graph represents the regression line which shows a positive
linear relationship between X and Y.
Important figures for interpretation:
Multiple R = 0.4047
variable X and Y.

=> This R shows the Correlation Coefficient between the

This means that there is a positive linear relationship (because the sign is positive)
but a weak correlation (because the value is less than 0.5) between X and Y.
R Square
= 0.1638 => This shows the Coefficient of Determination which
means that variable X only explains 16.38% of the variation in the variable Y.
This means here that Debt/Equity ratio only explains 16.38% of the variation in the
Dividend per share. While the remaining 83.62% is explained by other factors which
are not included in the model.
The closer the value of R Square to 1.0 means the higher the predictive ability of
the model to predict Y. Thats why in many research they will consider more than
one factor of X in the model to increase the goodness of fit or the predictive ability
of the model.

Significance F = 0.02651 => The value of F-test is less than 0.05 means that
the model is valid and a good fit because it is significant at 5 percent level.

Coefficient X variable 1 = 0.002987 => This the beta coefficient of variable X.


P value = 0.02651 => This is the value of t-test of beta coefficient X
where it tests the null hypothesis :
Ho : Bx =0 or Ho: There is no significant relationship between
X and Y
if the p-value is lower than 0.05 means that the result is significant at 5 percent
level and we reject Ho.
This means X significantly affect Y with 95 percent confidence level.
if the p-value is lower than 0.01 means that the result is significant at 1 percent
level and we reject Ho.
This means X significantly affect Y with 99 percent confidence level.
But if the p-value is higher than 0.05 then this means that the result is not
significant at 5 percent level, Ho is not rejected and we unable to conclude that
there is a significant relationship between X and Y.
From the value of the coefficient +0.002987 which is a positive sign and the p-value
is lower than 0.05 means that we can conclude that variable X (Debt/equity ratio) is
positively and significantly affect variable Y (Dividend per share). This means an
increase in Debt/Equity ratio will lead to higher Dividend per share. Thus we
conclude that the leverage significantly affect dividend policy or we conclude there
is a significant positive relationship between X and Y.
However, if you find that the p-value is higher than 0.05, this means the
results is not significant at 5 percent level thus you conclude that X does
not significantly affect Y.

Intercept = 0.07229 => This is the value of a constant, a, where the regression
line intercept the Y axis.

Therefore the relationship between the Debt/Equity ratio and DPS can be written
as :
(remember Y = a + bX )

DPS = 0.07229 + 0.002987 Debt /Equity ratio


So, we can use the above equation to predict the value of DPS if we know the Debt/Equity ratio.

However, if your F-test (Significance F) is more than 0.05, then the model is not valid and
therefore you cannot model the equation as Y = a + bX because it does not possess predictive
ability to predict value of Y. Therefore if that happen, you can try to change the variable that you
use in Y or X. For e.g try to run regression using Dividend Payout or Dividend Yield to replace
DPS. You can also try to use Market Value as Y variable to find out how leverage affect Firm
Value.

Further explanation of what is R and what is R Square is given below:

Correlation
Coefficient, r :
The quantity r, called the linear correlation coefficient, measures the strength and
the direction of a linear relationship between two variables. The linear correlation
coefficient is sometimes referred to as the Pearson product moment correlation
coefficient in honor of its developer Karl Pearson.
The mathematical formula for computing r is:

where n is the number of pairs of data.


The value of r is such that -1 < r < +1. The + and signs are used for positive
linear correlations and negative linear correlations, respectively.
Positive correlation: If x and y have a strong positive linear correlation, r is
close to +1. An r value of exactly +1 indicates a perfect positive fit. Positive values
indicate a relationship between x and y variables such that as values
for x increases, values for y also increase.
Negative correlation: If x and y have a strong negative linear correlation, r is
close to -1. An r value of exactly -1 indicates a perfect negative fit. Negative values
indicate a relationship between x and y such that as values for x increase, values
for y decrease.
No correlation: If there is no linear correlation or a weak linear correlation, r is
close to 0. A value near zero means that there is a random, nonlinear relationship
between the two variables
A perfect correlation of 1 occurs only when the data points all lie exactly on a
straight line. If r = +1, the slope of this line is positive. If r = -1, the slope of this
line is negative.
A correlation greater than 0.8 is generally described as strong, whereas a
correlation less than 0.5 is generally described as weak. These values can vary based
upon the "type" of data being examined. A study utilizing scientific data may require
a stronger correlation than a study using social science data.

Coefficient of Determination, r 2 or R2 :
The coefficient of determination, r 2, is useful because it gives the proportion of
the variance (fluctuation) of one variable that is predictable from the other
variable.
It is a measure that allows us to determine how certain one can be in making
predictions from a certain model/graph.
The coefficient of determination is the ratio of the explained variation to the total
variation.
The coefficient of determination is such that 0 < r 2 < 1, and denotes the strength
of the linear association between x and y.
The coefficient of determination represents the percent of the data that is the
closest to the line of best fit. For example, if r = 0.922, then r 2 = 0.850, which
means that 85% of the total variation in y can be explained by the linear relationship
between x and y (as described by the regression equation). The other 15% of the
total variation in y remains unexplained.
The coefficient of determination is a measure of how well the regression line
represents the data. If the regression line passes exactly through every point on
the scatter plot, it would be able to explain all of the variation. The further the line is
away from the points, the less it is able to explain.

APPENDIX : HOW TO INSTALL A DATA ANALYSIS INTO EXCEL 2010

Here we see no Data Analysis menu on the right top of the screen.

Step 1: Click File menu on the top left of the screen, then click Option

Step 2: Click Add-Ins menu

Step 3: Click Analysis ToolPack and Click Go

Step 4 : Tick inside the box of Analysis ToolPak and click OK

Now Data Analysis menu will appear on the top right of the screen. You can now
begin your regression analysis.

For latest Excel 2013 or older version of Excel, eg. Excel 2007 or below,
you can search at YouTube where you can find a lot of videos showing how
to install Data Analysis steps by steps.

Das könnte Ihnen auch gefallen