You are on page 1of 15

Linear regression

Linear relations

• Yt = b0 + b1 Xt
• Where Yt is the dependent variable being forecasted
• Xt is the independent variable being used to explain Y. In
Linear Trend Lines, Xt is assumed to be t.
• b1 is the slope of the line, determined by Excel
• b0 is the y intercept of the line, determined by Excel
Coefficient of Determination: R-square

• Proportion of variation in Y around its mean that is


accounted for by the regression model
• 0 <= R2 <= 1
• Will always increase as add more independent
variables into regression model. Use adjusted R2 to
compare when more than one independent
variable is used
Standard Error of the line: Se

• The standard deviation of estimation errors


• The measure of amount of scatter around the
regression line
• Can be used as a rough rule of thumb for predicting
level of accuracy.
Simple Linear Regression: Example

You want to examine Annual Store


Sales Feet
the linear dependency
of the annual sales of ($1000)

produce stores on their 1 1,726 3,681


size in square footage. 2 1,542 3,395
Sample data for seven 3 2,816 6,653
stores were obtained. 4 5,555 9,543
Find the equation of 5 1,292 3,318
the straight line that 6 2,208 5,563
fits the data best. 7 1,313 3,760
Scatter Diagram: Example
12000
Annua l Sa le s ($000)

10000

8000

6000

4000

2000

0
0 1000 2000 3000 4000 5000 6000

S q u a re F e e t
Equation for the Sample
Regression Line: Example

Yˆi  b0  b1 X i
 1636.415  1.487 X i
Interpretation of Results: Example

Sales = -900.52 + 0.6336 * Area

The slope of 0.6336 means that for each increase of


one unit in Area, we predict the average of Sales to
increase by an estimated 0.6336 units.

The model estimates that for each increase of one


square foot in the size of the store, the expected
annual sales are predicted to increase by $634.
Regression
• Consider the following dataset.
salesData <- read.csv("
G:\\Training\\NCT\\Sales.csv",header=TRUE)
plot(salesData$sales,salesData$area)
Regression
• Fit a regression model to the data.
Model1 = lm(Sales ~ Area, data = salesData)
• Output:
Regression
• Maybe there is more…
model1 = lm(Sales ~ Area, data = salesData)
summary(model1)
• Output:
Regression
• And more…
par(mfrow=c(2,2))
plot(model1)
Regression
• What all is there?
names(model1)
• Output
Regression
• What all is there?
• names(summary(model1))
• Output

• And even more…


Regression
• Fit confidence intervals to line.
predict.lm(cherry.lm,
interval="confidence")

• Prediction intervals to a new data set can also be


generated.