Sie sind auf Seite 1von 33

Relationship Between Variables

1
Model building with one regressor
Example:-Consider the relationship between PRICE
(Y) and square feet living area SQFT (X) of a house.
• There probably is a relationship......as SQFT
increases, PRICE should increase.
• But how would we measure and quantify this
relationship?

2
Population linear regression model between PRICE and SQFT

PRICE  o  1SQFT  
• In formulating the above relation between PRICE and
SQFT we are ignoring the fact that the price of a house
depends on other characteristics as well, such as a lot
size and number of bathrooms etc. Thus we are
basically assuming that all these effects are absorbed
by the error term.
The error term is actually a combination of four different effects
1. It accounts for the effect of variables omitted from the
model
2. It captures the effects of nonlinearities in the relationship
between Y and X
3. Measurement errors in X and Y are also absorbed by the
error term
4. Error term also includes inherently unpredictable random
effects 3
PRICE(Y) SQFT(X)
(Price-Area.mtw)
199.9 1065
Consider the data on sale price ( thousand of
dollars) and living area ( in square feet) of 14
228 1254
houses in a particular location 235 1300
Estimate the relationship between PRICE and 285 1577
SQFT 239 1600
293 1750
285 1800
365 1870
295 1935
290 1948
385 2254
505 2600
425 2800
415 3000
4
• Identify the type of relationship between variables by using
graphical tool ( Scatter plot)
• Estimate the relation by using Method of Least Squares and
interpret it
• Test the significance of regression by
• T-test
• F-test (ANOVA)
• Measure the goodness of fit of the model (Coefficient of
Determination)
• Predict the price of a house with 1500 square feet area

5
Graphical view of relationship

6
Significance of Regression
T-test

Term Coef SE Coef T-Value P-Value


Constant 52.4 37.3 1.40 0.186
SQFT(X) 0.1388 0.0187 7.41 0.000
Significance of Regression
Analysis of Variance (F-test)

Source DF SS MS F P
Regression 1 83541 83541.4 54.86 0.000
Error 12 18274 1522.8
Total 13 101815

7
Coefficient of Determination=82%
82% variation in house price has been explained by
square feet constructed area and remaining 18 % due to
other unknown factors

Price of a house with 1500 square feet area

260.476 thousand of dollars


$ 2,60,476

8
Presenting Regression Results
PRICE = 52.35091 + 0.138750 SQFT
SE (37.2855) (0.1388)
t-ratio (1.4041) (7.4067)
P-Value (0.1857) (<0.0001)

Se = 39.02 R2 =0.8205

9
Multiple Linear Regression with two regressors
(Sales price-size-Rating.mtw)
A real estate agency collects the following data concerning
Y =Sales price of a house ( in thousands of dollars)
X1= Home Size (in hundreds of square feet)
X2=Rating (an overall ”niceness rating” for the house
expressed on a scale from 0 (worst) to 10 (best), provided
by the real estate agency

Sales price Y Home Size X1 Rating X2


120.0 23 5
65.4 11 2
115.4 20 9
91.0 17 3
94.0 15 8
110.6 21 4
129.0 24 7
85.2 13 6
109.0 19 7
10
115.0 25 2
• Identify the type of relationship between variables by using
graphical tool ( Scatter plot)
• Estimate the relation by using Method of Least Squares and
interpret it
• Test the over all significance of regression by F-test (ANOVA)
• Test the individual significance of regression by T-test
• Measure the goodness of fit of the model (Coefficient of
Determination)
• Predict the price of a home with 1600 square feet area and
rating of 6

11
Scatter plots

12
Price = 19.56 + 3.742 Size + 2.556 Rating

• The value of b1=3.74, indicates that mean sale price is


expected to increase by $ 3,740 (3.74 thousands) with
each 100 square feet increase in house size keeping the
effect of rating constant.

• The value of b2=2.56, indicates that mean sale price is


expected to increase by $ 2,560 (2.56 thousands) with
each 1 point increase in rating keeping the effect of
house size constant.

13
Analysis of Variance (Overall Significance)

Source DF Adj SS Adj MS F-Value P-Value


Regression 2 3277.31 1638.66 350.87 0.000
Error 7 32.69 4.67
Total 9 3310.00

Individual Significance
Term Coef SE Coef T-Value P-Value
Constant 19.56 3.26 6.00 0.001
Size 3.742 0.152 24.56 0.000
Rating 2.556 0.289 8.85 0.000

14
Coefficient of Determination=99%
99% variation in house price has been explained by
square feet constructed area and rating while only 1 %
is due to other unknown factors

Price of a house with 1500 square feet area

94.7722 thousand of dollars


$ 94,772.2

15
Curvi Linear Regression ( Quadratic Regression)
A product development engineer is interested in investigating the tensile
strength of a new synthetic fiber that will be used to make cloth for men's
shirts. The engineer knows from previous experience that the strength is
affected by the weight percent of cotton used in the blend of materials for
the fiber. Furthermore, he suspects that increasing the cotton content will
increase the strength, at least initially. He also knows that cotton content
should range between about 15 and 35 percent if the final product is to
have other quality characteristics that are desired (such as the ability to
take a permanent-press finishing treatment). The engineer decides to test
specimens at five levels of cotton weight percent: (15, 20, 25, 30, 35). Fit
curve of appropriate degree and estimate cotton percent for maximum
tensile strength 16
Y= Strength
X= Cotton %

Y ( Strength) X1 (Cotton %)
10 15
15 20
19 25
17 30
10 35

17
Scatter plot

18
Strength  36.02  4.33Cotton  0.0857 Cotton2

Analysis of Variance (Overall Significance)

Source DF Adj SS Adj MS F-Value P-Value


Regression 2 64.686 32.343 30.59 0.032
Error 2 2.114 1.057
Total 4 66.800

Significance of Quadratic Regression


Term Coef SE Coef T-Value P-Value
Constant -36.09 6.54 -5.52 0.031
Cotton 4.326 0.553 7.82 0.016
Cotton2 -0.0857 0.0110 -7.80 0.016

19
Coefficient of Determination=97%
97% variation in strength has been explained by cotton
percentage and remaining 3 % is due to other
unknown factors

Strength with 22 percent cotton

17.5943

20
The value of X at which maximum or minimum value of quadratic regression occur

b1
X  25.27
2b2
The maximum or minimum value of Y is

b12
bo   18.67
4b2

21
Heating Oil Example
Determine if a quadratic model is
Oil (Gal) Temp Insulation
needed for estimating heating oil 275.30 40 3
used for a single family home in the 363.80 27 3
month of January based on average 164.30 40 10
40.80 73 6
temperature and amount of
94.30 64 6
insulation in inches. 230.90 34 6
366.70 9 6
300.60 8 10
237.80 23 10
121.40 63 3
31.40 65 10
203.50 41 6
441.10 21 3
323.00 38 3
52.50 58 22
10
Scatter Diagram
a) Oil used VS Temp
b) Oil used VS insulation

Fig. a:- Ist degree curve is appropriate Fig. b:- 2nd degree curve is appropriate
23
ˆ
Y  624.59  5.36 X1  44.59 X 2  1.87 X 2
2

Test of overall significance of regression


Ho : 1   2   3  0
H 1 : Atleast one  is not zero

RegSS
Reg df RMS
F   129.70*
ESS EMS
Edf

ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 229643.2 3 76547.721 129.701 .000a
Residual 6492.065 11 590.188
Total 236135.2 14
a. Predictors: (Constant), X2_2, X1, X2
b. Dependent Variable: Y
24
Heating Oil Example:

• Testing the Quadratic Effect


– Model with quadratic insulation term
Yi  0  1 X1i  2 X 2i  3 X 2i  i
2

– Model without quadratic insulation term


Yi   0  1 X1i   2 X 2i   i
– Hypotheses
• H 0 :  3  0 (No quadratic term in insulation)
• H :   0 (Quadratic term is needed in insulation)
1 3

25
Test of significance of quadratic term
Is quadratic term in insulation needed on monthly
consumption of heating oil? Test at  = 0.05.
H0: 3 = 0
Test Statistic:
H1: 3  0 b3   3 1.8667  0
t   1.6611
df = 11 Sb3 1.1238
P-value=0.1249
Decision: Do not reject H0 at  = 0.05

Conclusion: There is not Coefficientsa

Unstandardized Standardized
sufficient evidence for the Coefficients Coefficients
need to include quadratic Model
1 (Constant)
B
624.586
Std. Error
42.435
Beta t
14.719
Sig.
.000
effect of insulation on oil X1 -5.363 .317 -.854 -16.910 .000
X2 -44.587 14.955 -1.019 -2.981 .012
consumption. X2_2 1.867 1.124 .568 1.661 .125
26
a. Dependent Variable: Y
Non-Linear Regression
It is easy to deal with the regression, which is linear in
parameters, but in some situations the models are non-linear.
The non-linear models can be divided into two types

• Intrinsically Linear
• Intrinsically Non-Linear models
The models that can be transformed in to linear models after
applying some suitable transformation are called intrinsically
linear models and the models that can not be transformed in
to linear models are called intrinsically non-linear models.

27
EXAMPLE:- The number (Y) of bacteria per unit volume
present in a culture after X hours is given in the following
table. Fit a least square curve having the form Y=abX to the
data. Estimate the value of Y when X=7.

Y X
32 0
47 1
65 2
92 3
132 4
190 5
275 6 28
Parameter Value
A 3
B 2
Estimates at Each Iteration
Iteration SSE A B
0 32736.0 3.0000 2.00000
1 31407.7 3.8925 1.90919
2 31049.7 5.8835 1.76355
3 27912.9 8.1791 1.66840
4 23299.3 12.9137 1.54519
5 14316.9 30.1663 1.34441
6 739.3 30.4761 1.46166
7 8.3 31.2693 1.43635
8 7.9 31.3666 1.43515
9 7.9 31.3680 1.43514
10 7.9 31.3680 1.43514
11 7.9 31.3680 1.43514
Equation Y = 31.368 * 1.43514 ^ X 29
Regression with Qualitative(categorical)
independent variable
• Salary =f( education, experience, gender, race )+E
• Main.charges of Vehicle=f( Age, quality) +E

30
EXAMPLE:- Consider a data on annual salary of male and female
college teachers and years of teaching experience.

Salary (y) Gender Years of teaching experience


Thousand of rupees

60 Male 6
50 Male 4
40 Female 5
40 Male 3
60 Male 6
50 Female 7
60 Female 5

31
Regression when categorical variable more than two
categories
Suppose we want to find regression of
annual expenditure on healthcare by an
individual on the income and education
of the individual.
Since the variable education is
qualitative in nature we consider three
levels of education
(i) Less than high school
(ii) High School
(iii) College 32
Expenditure (Y) Income (X) Education

38 90 College
40 95 College
39 82 College
32 75 College
36 87 College
40 93 College
20 45 High School
18 46 High School
17 48 High School
28 52 High School
3 18 Less than High School
4 25 Less than High School
5 27 Less than High School
6 32 Less than High School
4 20 Less than High School
33

Das könnte Ihnen auch gefallen