Sie sind auf Seite 1von 13

CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD LINEAR REGRESSION Linear regression attempts to model the

relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable or independent variable, and the other is considered to be a dependent variable. For example, a modeler might want to relate the weights of individuals to their heights using a linear regression model. Before attempting to fit a linear model to observed data, a modeler should first determine whether or not there is a relationship between the variables of interest. This does not necessarily imply that one variable causes the other (for example, higher SAT scores do not cause higher college grades), but that there is some significant association between the two variables. A scatter plot can be a helpful tool in determining the strength of the relationship between two variables. If there appears to be no association between the proposed explanatory and dependent variables (i.e., the scatter plot does not indicate any increasing or decreasing trends), then fitting a linear regression model to the data probably will not provide a useful model. Linear Regression lines (i) Regression line when Y depends upon X. sm_078@hotmail.com

A linear regression line has an equation of the form Y = a + bX, where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0).

(ii)

Regression line when X depends upon Y.

A linear regression line has an equation of the form X = a0 + b0Y, where Yis the explanatory variable and X is the dependent variable. The slope of the line is b0, and a0 is the intercept (the value of x when y= 0). FITTING OF REGRESSION LINES BY LEAST-SQUARES METHOD The most common method for fitting a regression line is the method of least-squares. This method calculates the best-fitting line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line (if a point lies on the fitted line exactly, then its vertical deviation is 0). Because the deviations are first squared, then summed, there are no cancellations between positive and negative values. We may work out the values of a and b by using the following formulae. a = y bx

Page 1 of 13

CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD n xy x y b = n x 2 ( x )


2

sm_078@hotmail.com

Example # 1 i) ii) x y Solution: x 2 3 6 8 19 n xy x y b x= a Y (ii) = n x ( x )


2 2

Fit a regression line y on x by the method of least squares Estimate when x = 12 2 3 6 5 7 10 12 y 5 7 10 12 34 4(187 ) (19)(34) 4(113) (19) 2 xy 10 21 60 96 187 X2 4

9 36 64 113

1.2

4.75 , y = y bx = 3.17+1.2X

8.5

= 3.17

X = 12 Y = 3.17+1.2(12)

17.57

CORRELATION Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. For example, height and weight are related; taller people tend to be heavier than shorter people. The relationship isn't perfect. People of the same height vary in weight, and you can easily think of two people you know where the shorter one is heavier than the taller one. Nonetheless, the average weight of people 5'5'' is less than the average weight of people 5'6'', and their average weight is less than that of people 5'7'', etc. Correlation can tell you just how much of the variation in peoples' weights is related to their heights. Although this correlation is fairly obvious your data may contain unsuspected correlations. You may also suspect there are correlations, but don't know which are the strongest. An Page 2 of 13

CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD sm_078@hotmail.com intelligent correlation analysis can lead to a greater understanding of your data. CORRELATION COEFFICIENT For a set of variable pairs, the correlation coefficient gives the strength of the association. The square of the size of the correlation coefficient is the fraction of the variance of the one variable that can be explained from the variance of the other variable. It is denoted by r

The value of r is such that -1 < r < +1. The + and signs are used for positive linear correlations and negative linear correlations, respectively. Positive correlation: If x and y have a strong positive linear correlation, r is close to +1. An r value of exactly +1 indicates a perfect positive fit. Positive values indicate a relationship between x and y variables such that as values for x increases, value for y also increase. Negative correlation: If x and y have a strong negative linear correlation, r is close to -1. An r value of exactly -1 indicates a perfect negative fit. Negative values indicate a relationship between x and y such that as values for x increase, values for y decrease. No correlation: If there is no linear correlation or a weak linear correlation, r is close to 0. A value near zero means that there is a random, nonlinear relationship between the two variables Note that r is a dimensionless quantity; that is, it does not depend on the units employed. A perfect correlation of 1 occurs only when the data points all lie exactly on a straight line. If r = +1, the slope of this line is positive. If r = -1, the slope of this line is negative. A correlation greater than 0.8 is generally described as strong, whereas a correlation less than 0.5 is generally described as weak. These values can vary based
Page 3 of 13

CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD sm_078@hotmail.com

upon the "type" of data being examined. A study utilizing scientific data may require a stronger correlation than a study using social science data.
Calculation of Product Moment correlation coefficient n xy x y

[ n x

( x ) 2

] n y

( y ) 2

Question # 1 A study was made to know the relation between advertising expenditure (x) and the increase in sales (y). Following data were obtained. X: Y: 140 80 120 75 i) ii) iii) 125 78 120 76 130 82 150 90 140 87 160 100 180 120 195 126 125 130 150 125

Plot a scatter diagram. Find the regression line to predict increase in sales from advertisement expenditure. Estimate increase in sales when advertising expenditure is 250

Question # 1 Given the following data, fit the regression lines (y on x). Page 4 of 13

CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD X Y 2.80 3.94 2.10 2.63 3.75 3.20 2.80 3.57 2.55 2.25 3.00 2.80 2.50 3.40 2.20 2.00 sm_078@hotmail.com 2.90 3.70 3.80 3.97 2.7 0 3.2 0 3.4 5 3.9 0

Ans:, y=0.75X+.05 Question # 2 Given the data: X Y 3 8 7 3 5 2 4 2 3 3 3 4 9 3 6 7

Fit a regression line of X on Y and hence estimate X if Y=4.5. Ans X= -0.194y+5.776; 4.903 Question # 3 Calculate the equation of the least squares regression line of y on x from the following data. X Y 1 5 3 3 3 2 4 2 5 0 5 1

Ans: Y = 5.88 1.06X Question # 4 An organization has collected the following data showing relationship between price charged and quantities sold: P 5 rice Q 590 560 555 540 525 500 480 475 ty i)Determine the regression line equation. ii)Compute the quantity that the company may produce if it wishes to sell at the price of 18 6 7 8 9 10 12 13

Page 5 of 13

CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD Question # 5 The manager of an educational computer facility would like to develop a model to predict the number of services calls per annum for interactive terminals based upon the age of the terminal. sample of 10 terminals was selected. The data follows: Terminal No.of service calls Age(years) 1 3 1 2 4 1 3 3 2 4 5 2 5 5 3 6 7 3 7 8 4 8 10 4 9 10 5 10 12 5 sm_078@hotmail.com

Fit a regression line to predict the number of services calls. Ans: Y= 0.092+0.434X Question # 6 Compute the coefficient of correlation between height (cm.) and weight (kg.) of six adults. Heights(cm 170 ) Weight 57 (kg) Ans ; 0.864 Question # 7 A personnel officer is studying performances of job applicants on two tests given when the applicant contacts the firm. The first test measures mental ability; the second measure potential for success in the job. The test-score results of a sample of eleven applicants are shown below: Applicants Mental ability(X) Potential(Y) A B 3 40 7 6 42 3 C 36 41 D 49 39 E 36 38 F 40 49 G 39 25 H 47 29 I 32 15 J 65 52 K 27 25 175 64 176 70 178 76 183 71 184 82

Calculate the sample correlation coefficient. Ans; 0.41

Page 6 of 13

CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD Question # 8 The following data were obtained in a study of the relationship between the weight and chest size of infants at birth: Weight(kg 3.71 ) Chest size 28.7 2.31 28.3 4.30 30.3 3.21 27.2 4.32 27.7 2.75 29.5 5.52 36.5 2.15 26.3 4.41 32.2 sm_078@hotmail.com

Compute and interpret the sample correlation coefficient. Ans; 0.784 Question # 9 Calculate the coefficients of correlation between the values of X and Y from the following tables. X Y 76 123 87 135 95 154 67 110 57 105 77 134 59 121 59 106

Ans; 0.917 Question # 10 State in each case whether you would expect a positive correlation, a negative correlation, or no correlation: i)The ages of husbands and wives; ii)The amount of rubber on tires and the number of miles they have been driven; iii)Shoe size and IQ; iv)The weight of the load of trucks and their petrol consumption.

Question # 11 For the data of heights and weights of 5 men:

Page 7 of 13

CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD Height Weight 64 160 68 170 70 180 sm_078@hotmail.com 72 190 74 195

i)Establish a least squares equation of Regression between height and weight. ii)Calculate Co-efficient of Correlation. Question # 12 A computer while calculating the correlation co-efficient between two variables X and Y from 25 pairs of observation obtained the following sums: X = 125 X2 = 650 Y = 100 Y2 = 460 XY = 508 The following mistakes were discovered at the time of checking: Wrong Values Recorded X 6 8 Y 14 6 Correct Values need to be Recorded X Y 8 12 6 8

Find out the correct value of the co-efficient of correlation. Question # 13 For the following two sets of bivariate data, the regression lines for each set are, respectively: i) y = 1.94x + 10.83 (y on x) and x = 0.15y + 6.18 (x on y) ii) y = -1.96x + 15(y on x) and x = -0.45y + 7.16 (x on y)

Required: Find the co-efficient of correlation in each case. Question # 14

Page 8 of 13

CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD sm_078@hotmail.com A firm trains employees to use a statistical software package. A random sample of trainees turned in the following performance: Traine e A B C D E F G i) ii) Hours of Training (x) 1 4 6 8 2 3 1 6 3 2 1 5 4 7 Number of errors (y)

Determine the least square regression line of y on x. Interpret the co-efficient of regression.

Predict the number of errors for a person with 5 hours of training.

Question # 15 A research compiled the following information to investigate, the relationship between smoking and lung cancer:

Page 9 of 13

CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD sm_078@hotmail.com

Country USA UK Finland Switzerland Canada Holland Australia Denmark Sweden Norway Iceland

Per Capital Cigarette Consumption 1300 1100 1100 510 500 490 480 380 300 250 230 20

Deaths per 100.000 From Lung Cancer 46 65 25 15 24 18 17 11 9 6

Compute r and ~ describe what they mean Ans: 0.5476

Question # 16 Calculate the co-efficient of correlation for the following data: Annual percentage Annual percentage increase in advertising Page 10 of 13

CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD expenditure (X) 1 3 4 6 8 9 11 14 1 2 4 4 5 7 8 9 sm_078@hotmail.com increase in sale revenue (Y)

ii) What is the purpose of finding the correlation co-efficient and what dose its value indicate in respect of the above data on advertising expenditure and sales revenue? Ans: r = 0.9770 Question # 17 Calculate the co-efficient of correlation for the following data: Annual percentage increase in advertising expenditure (X) 1 3 4 6 8 9 11 14 Annual percentage increase in sale revenue (Y) 1 2 4 4 5 7 8 9

ii) What is the purpose of finding the correlation co-efficient and what dose its value indicates in respect of the above data on advertising expenditure and sales revenue?

Page 11 of 13

CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD sm_078@hotmail.com

Ans: r = 0.9770 Question # 18 An analyst is studying the relationship between shopping center traffic and a department Stores daily sales. The analyst develops an index to measure the daily volume of traffic Entering the shopping center and an index of daily sales. The following table shows the Index value for 9 randomly selected days. Traffic Index (x) 71 82 111 85 89 110 75 105 120 Sale Index (y) 135 170 184 160 175 190 140 152 210

Forecast sale index for a traffic index of 132 by the method of least square. Ans: 198.29
Q 19 company wants to assess the impact of advertising expenditures on its annual profit. The following table presents the information for eight years: Year 2001 2002 2003 2004 2005 2006 2007 2008 Advertising Expenditure ( in millions) 90 100 95 110 130 145 150 140 Annual profit ( in millions) 45 42 44 60 30 34 35 30

(a) Construct the least square regression equation and predict the annual profit for the year 2009 if the advertising expenditure is budgeted at Rs. 160 million. (b) Determine the coefficient of correlation and interpret your result. Q 20

Page 12 of 13

CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD Student Name Marks given by Judge A Marks given by Judge B Ali 70 54 Adil 92 43 80 43 65 67 sm_078@hotmail.com Asif Ahmed Ayub 70 64

Required: Calculate Spearmans Rank Correlation Coefficient.


Q 21

A firm trains employees to use a statistical software package. A random sample of trainees turned in the following performance: Traine e A B C D E F G i) Hours of Training (x) 1 4 6 8 2 3 1 6 3 2 1 5 4 7 Number of errors (y)

Determine the least square regression line of y on x.

ii) Interpret the co-efficient of regression. Predict the number of errors for a person with 5 hours of training.

Page 13 of 13

Das könnte Ihnen auch gefallen