Sie sind auf Seite 1von 47

SIMPLE LINEAR

REGRESSION
Introduction
Regression Analysis

● Regression analysis is the process of constructing a


mathematical model or function that can be used to predict
or determine one variable by another variable or other
variables.
● It examines the relationship between two or more variables.
● Simple regression analysis examines a straight-line
relationship between two variables.

3
Example of Simple Regression Analysis

4
Relevance to Statistics

● Simple regression analysis can be used to infer relationships


between the dependent and independent variables.
● Helps predict the influence of the independent variable over
a dependent variable.
● Helps determine which independent variable has an
influence over a dependent variable.

5
The SLRM or First-Order
(Straight-Line) Model
6
this simple model approximates the relationship between the dependent
variable and the independent variable
7
where:
the y-intercept of the regression line (the expected y-value
βo when x=o); has practical meaning only if x=0 is in the range of
values of x in the sample data

the slope of the regression line (the expected change in y per


β1 1-unit increase in x)

ε the random error component; accounts the variability in Y


that cannot be explained by the linear relationship between
X and Y

8
Justifications for the Error Term in the
Model: ε (epsilon)

1. Accounts for other factors, which are not included in the model, that
affect Y aside from X
2. Accounts for measurement error even for those factor/s included in
the model
3. Dependent Variable Y is a random variable due to the error term in
the model.
Justifications for the Error Term in the
Model:

The error term ε is the random component of the model while


βo + β1x is the deterministic component of the model. Hence, the
error term is the reason for the model being called a probabilistic model
Fitting the SLRM using the
Method of Least Squares
11
Method of Least Squares

● A form of mathematical regression analysis that finds the


line best fit for a set of data
● Aims to create a straight line that minimizes the sum of the
squares of the errors generated by the results of the
associated equations

12
SSE or the least square estimates the parameters of B0 and b1
respectively wherein the least square line or linear regression
equation is denoted by :

ŷ=b0+b1*x

13
Sxx=∑xiyi -nxӮ
Sxy=∑xi2-n(x2)
Syy=∑yi2-n(Ӯ2)
MSE= SSE/n-2
SSE= Syy-b1*Sxy

14
Significance Tests
Concerning SLRM
15
The significance test is performed to know whether we should
reject the hypothesis or accept the hypothesis given the
rejection region.

16
Significance Test

Doing the hypothesis and alternative hypothesis:

To test Ho: b0 = 0 (ex. Regression line passes through the origin


Ha: b0 ≠ (ex. Regression line does not pass throught the origin)

17
18
Test Statistic: t=b1/sqrt(MSE/Sxx)
Rejection region: 𝞪/2

b1= Sxy/Sxx
MSE=SSE/n-2
Sxx=summation of Xi2-n(meanx2)

19
Pearson’s r
20
IN LINEAR CORRELATION

a statistical method of
determining the nature and
strength of the linear
relationship between two
numerical (i.e., interval and
ratio) variables X and Y using a
single numerical value known as
the Pearson’s product moment
correlation coefficient (or
Pearson’s r)

21
OTHER VARIATIONS

OTHER VARIATIONS

22
What do we get from knowing r ?

A) R’S SIGN
◉ IF IT IS POSITIVE, IT MEANS THERE IS A DIRECT LINEAR
RELATIONSHIP BETWEEN X AND Y
◉ IF IT IS NEGATIVE, IT MEANS THERE IS AN INVERSE LINEAR
RELATIONSHIP BETWEEN X AND Y

23
B) MAGNITUDE OF R

24
25
LIMITATIONS OF PEARSON’S R

1. It measures how closely two quantitative variables approximate a


straight line; it does not validly measure the strength of a nonlinear
relationship.
2. It should not be taken to imply a cause-and-effect relationship even
when there is a strong (or statistically significant) correlation between
two variables. A causal relationship could not be based on a single
correlation coefficient but on an overwhelming body of evidence
consisting of the consistency of the results of a large number of
population and laboratory studies.

26
LIMITATIONS OF PEARSON’S R

3. It may not be reliable when the number of pairs observations (n) is


small.
4. It is sensitive to outliers (i.e., observations that clearly appear to be out
of range of the other observations), resulting to misleading results.

27
EXAMPLE
SUMMARY STATISTICS:
sxy= 79.6 To solve for Pearson’s r:
sxx = 65.6
syy = 113.6

DOES THIS SUGGEST A DIRECT LINEAR RELATIONSHIP?

28
TESTING ITS SIGNIFICANCE
Statements:
Ho: = 0 (i.e., there is no significant linear relationship between X and Y) versus
Ha: 0 (i.e., there is a significant linear relationship between X and Y)

Therefore, we reject Ho and conclude that there is a significant linear relationship


between X and Y.

29
Simple Case
30
FORMULA

31
32
SIMPLE CASE

The marketing manager of a large supermarket chain would like


to determine the effects of shelf space on the sales of pet food. A
random sample of n=12 equal sized stores is selected with the
following results:

33
Store Shelf Space, Weekly Sales, Store Shelf Space, Weekly Sales,
X feet Y dollars X feet Y dollars

1 5 160 7 15 230

2 5 220 8 15 270

3 5 140 9 15 280

4 10 190 10 20 260

5 10 240 11 20 290

6 10 260 12 20 310

34
Required

◉ Set up the SLRM for this data set and indicate the scope of
regression.
◉ Estimate the expected weekly sales of all the stores with a
12 feet of shelf space.
◉ Interpret the scope of regression equation.
◉ Test the significance of shelf space as a predictor for the
mean weekly sales.
◉ Give and interpret the following: Pearson’s r; Coefficient of
determination r-squared

35
Activity
36
QUESTION 1

What is the meaning of SLRM?


a) Sorry Late Reply Ma
b) Simple Line/Linear Regression Model
c) Straight Line Regression Model
d) So Long Remember Me

37
QUESTION 2

What does “ε” stand


for?
a) Ewan
b) Error
c) Summation
d) Y-intercept
38
QUESTION 3

The error term (ε) is the reason for the model being called
the ___________model

a. Probabilistic
b. America’s Next Top
c. Deterministic
d. Victoria’s Secret

39
QUESTION 4

WHICH OF THE FOLLOWING CORRELATION COEFFICIENTS


SUGGEST A DIRECT LINEAR RELATIONSHIP?

a. r= -0.62
b. r= 0.005
c. r= 0.81
d. r= -1

40
QUESTION 5

What does ho mean?


A. Null hypothesis
B. Alternative hypothesis
C. Status quo
D. Equillibrium constant

41
QUESTION 6

___________ is a form of mathematical regression analysis that


finds the line best fit for a set of data.

A. Method of Least Squares


B. Method of La Salle
C. Method of Least Standard
D. Method of Lost Souls

42
QUESTION 7

___________ is the process of constructing a mathematical


model or function that can be used to predict or determine one
variable by another variable or other variables.

A. Regression Analysis
B. Regressive Analysis
C. Registration Analysis
D. Regular Analysis

43
FOR QUESTIONS 8 - 10

A store manager wishes to find out whether there is a


relationship between the age (X) of her employees and the
number of sick days (Y) they take each year. The data for a
sample of n=6 employees are shown below:

Employee 1 2 3 4 5 6

Age (X) 18 26 39 48 53 58

Days (Y) 16 12 9 5 6 2

44
FOR QUESTIONS 8 - 10

8. Set up the equation of the fitted regression line for this data set.
9. Give the Pearson’s correlation coefficient , r.
10. Give the sample coefficient of determination, r2

45
BONUS QUESTION

Who sang the hit song “Beep Beep Beep (Ang Sabi ng Jeep)”?

46
ANSWERS

1. B
2. B
3. A
4. C
5. A
6. A
7. A
8. Y = 21.100 - 0.317
9. -0.979
10. 95.9%
Bonus: Willie Revillame/Kuya Wil
47

Das könnte Ihnen auch gefallen