Beruflich Dokumente
Kultur Dokumente
On
Mathematical Applications on Correlation and Regression
(Math 51 & 54)
Prepared for,
Nusrat Khan
Assistant professor
Department of Finance
Faculty of business studies
University of Dhaka
Prepared by,
Group1 (Fab-5)
Department of Finance
Faculty of Business Studies
University of Dhaka
Course name:
Business Statistics
Course code: 107
1 656 5
2 853 14
3 646 6
4 783 11
5 610 8
6 841 10
7 785 9
8 639 9
9 762 10
10 762 9
11 862 7
12 679 5
13 835 13
14 607 3
15 665 8
16 647 7
17 685 10
18 720 8
19 652 6
20 828 10
Table 1:Random sample of 20 shipments of Bardi Trucking Co.
SCATTER DIAGRAM
15
Shipping Time (Days)
10
0
0 200 400 600 800 1000
Distance (Miles)
Figure 1: The Scatter Diagram of the Given sample
Scatter Diagram:
Interpretation: We can conclude that in the given relationship positive trend is obvious between the
Shipping distance (Miles) and Shipping time (days). Here, if the distance (Miles) increases, then the
shipping days also increases. The more the dots seem closer to the line, the higher is the positive
relationship. In this graph the dots are neither closer nor far away from the straight line. In a word the
independent variable has a positive but not complete influence over the dependent variable. Therefore,
we got to find the correlation coefficient of the variables to find out the specific and mathematical output.
∑(𝑿−𝑿)2
Standard Deviation of Distance (Miles) =𝜹𝒙 = √ 𝑛−1
134871.9
=√
20 − 1
= 𝟖𝟕. 𝟒𝟓
∑(𝒚−𝒚")2
Standard Deviation of Distance (Miles) =𝜹𝒚 = √ 𝑛−1
136.24
=√
20 − 1
= 𝟐. 𝟕𝟎𝟐𝟖
Distance(miles) x Shipping Time(days) y
Mean 725.85 Mean 8.4
Standard Error 19.55468378 Standard Error 0.60437005
Median 702.5 Median 8.5
Mode 762 Mode 10
Standard Deviation 87.45120444 Standard Deviation 2.702825033
Sample Variance 7647.713158 Sample Variance 7.305263158
Kurtosis -1.502903868 Kurtosis 0.109087751
Skewness 0.256412239 Skewness 0.113020821
Range 255 Range 11
Minimum 607 Minimum 3
Maximum 862 Maximum 14
Sum 14517 Sum 168
Count 20 Count 20
Table 2: Standard Deviation, mean and all the descriptice statistics in excel
Shipping
Distance(miles)
Shipment Time(days) (X-X̅ ) (Y-Y̅ ) (X-X̅ )² (Y-Y̅ )² (X-X̅ )(Y-Y̅ )
x
y
1 656 5 -69.85 -3.4 4879.0225 11.56 237.49
2 853 14 127.15 5.6 16167.123 31.36 712.04
3 646 6 -79.85 -2.4 6376.0225 5.76 191.64
4 783 11 57.15 2.6 3266.1225 6.76 148.59
5 610 8 -115.85 -0.4 13421.223 0.16 46.34
6 841 10 115.15 1.6 13259.523 2.56 184.24
7 785 9 59.15 0.6 3498.7225 0.36 35.49
8 639 9 -86.85 0.6 7542.9225 0.36 -52.11
9 762 10 36.15 1.6 1306.8225 2.56 57.84
10 762 9 36.15 0.6 1306.8225 0.36 21.69
11 862 7 136.15 -1.4 18536.823 1.96 -190.61
12 679 5 -46.85 -3.4 2194.9225 11.56 159.29
13 835 13 109.15 4.6 11913.723 21.16 502.09
14 607 3 -118.85 -5.4 14125.323 29.16 641.79
15 665 8 -60.85 -0.4 3702.7225 0.16 24.34
16 647 7 -78.85 -1.4 6217.3225 1.96 110.39
17 685 10 -40.85 1.6 1668.7225 2.56 -65.36
18 720 8 -5.85 -0.4 34.2225 0.16 2.34
19 652 6 -73.85 -2.4 5453.8225 5.76 177.24
20 828 10 102.15 1.6 10434.623 2.56 163.44
Total 13689 158 134871.9 136.24 2944.76
Mean (X̅ )
& (Y̅ )
725.85 8.4
Table 3: Required table for Coefficient of Correlation from excel
∑(𝑿−𝑿)(𝒀−𝒀)
The Formula: Correlation Coefficient= r = (𝒏−𝟏).𝜹𝒙.𝜹𝒚
𝟐𝟗𝟒𝟒.𝟕𝟔
= (𝟐𝟎−𝟏)∗𝟖𝟕.𝟒𝟓𝟏∗𝟐.𝟕𝟎𝟐𝟖
= 𝟎. 𝟔𝟗𝟐𝟏𝟎𝟒𝟒𝟐𝟕
Distance (miles) x 1
Shipping Time (days) y 0.692104427 1
Table 4: Coefficient of Correlation in Excel application
Interpretation: At First it consists of positive correlation. So, we conclude that there lies a direct
positive relationship between the shipping distance (mile), the independent variable, and shipping Time
(Days), the dependent variable. We also have visualized this already in the scatter diagram. Now the
mathematical output is 0.69210 which is fairly higher than 0.5 and closer to 1. So the association is strongly
positive. While explaining the implementation, if the remittance the shipping distance (mile) increases,
the shipping Time (Days) will most likely increase along with that.
Test of Hypothesis:
First step is to find out the null and alternate hypothesis for the one tail test.
According to given condition -
Given,
No. of observations = n = 20
So, the degree of freedom = df = (20-2) = 18
Correlation of the equation = r = 0.6921
𝒓×√(𝒏−𝟐)
t test for the correlation coefficient = 𝒕 =
√𝟏−𝒓𝟐
0.692104427 × √(20 − 2)
𝒕=
√1 − (0.479008538142174)
2.936
𝒕=
0.728
𝒕 = 𝟒. 𝟎𝟔𝟖𝟏
Here,
according to the question requirement, we use the 0.05 level of significance. From the ‘t
distribution table’ we find the critical value for the given conditions ( one tailed test, 18 df, 0.05
level of significance) to be 1.734.
Finally,
4.0681 > 1.734
Calculated value > table value
Since,
The calculated value is higher than the table value. We have to reject the null hypothesis
and accept the alternate hypothesis.
Coefficient of Determination: = r2
= (0.692104427)2
= 0.479008538142174
Interpretation: From the perchantage or proportion we can conclude that 47.90% of the
dependent variable, shipping distance (miles), can be explained or accounted for or affected by or
influenced by the independent variable, Shipping time (days).
To find the Standard error of estimate we have to, at first, find out the regression equation.
Regression Equation:
̂ = 𝒂 + 𝒃𝒙
𝒚
Y -INTERCEPT = a = ( ̅y - bX̅ )
=[8.4 − {(0.021390) × 725.85}]
= -7.1263
Interpretation: This is the regression equation where if we place a random value of independent
variable X, distance of miles, then we will get the estimated value of y, the shipping time (days) in future.
Shipment Distance(miles) x
Shipping
Time(days) y
̂
𝒚 ̂)²
(𝒚 − 𝒚
71.97132695
𝛿𝑦𝑥 = √
20 − 2
𝜹𝒚𝒙 = 𝟏. 𝟗𝟗𝟗𝟔𝟎
Interpretation: Standard Error of Estimate (SEE) is a measure of dispersion or scatter of the
observed values around the regression line. It shows how inaccurate or perfect a regression equation
could be. Since, the standard error of estimate of this equation is smaller and it is 1.99960, the data are
relatively close to the regression line. Therefore, the prediction of the dependent variable, shipping days,
is more precisely predictable based on the independent variable, distance (miles). In a word, the
regression equation is more predictable.
Because,
I. Standard Error Estimate: Since, the standard error of estimate of this equation is smaller
and it is 1.99960, the data are relatively close to the regression line. Therefore, the
prediction of the dependent variable, shipping days, is more precisely predictable based
on the independent variable, distance (miles). In a word, the regression equation is more
predictable.
II. Coefficient of Determination : From the perchantage or proportion we can observe that
47.90%, nearly half of the variation, of the dependent variable, shipping distance (miles),
can be explained by or accounted for or affected by or influenced by the independent
variable, Shipping time (days).
Answer to the question no. 54 (a)
There were 20 states in the sample.
Given,
Total no. of degree of freedom = 19
Or, (no. of observations -1) = 19
Or, (n-1) = 19
Or, (no. of states) - 1 = 19
Or, No. of States = 19+1
So, No. of States = 20
∑(𝑦−𝑦̂)²
So, the Standard Error of Estimate: SEE = 𝛿𝑦𝑥 =√
𝑛−2
𝑆𝑆𝐸
𝛿𝑦𝑥 = √
20 − 2
12054
𝛿𝑦𝑥 = √
20 − 2
Correlation of coefficient: r = √𝑟 2
= √0.46206
= 𝟎. 𝟔𝟕𝟗𝟕𝟓
Interpretation: At First it consists of positive correlation. So, we conclude that there lies a direct
positive relationship between the no. of construction fatalities, the independent variable, and the no.
unemployed people, the dependent variable. Now the mathematical output is 0.67975 which is fairly
higher than 0.5 and closer to 1. So the association is strongly positive. While explaining the
implementation, if the remittance the no. of construction fatalities increases, then the no. of unemployed
people will most likely increase along with that.
Given,
No. of observations = n = 20
So, the degree of freedom = df = (20-2) = 18
Correlation of the equation = r = 0.6797
𝒓×√(𝒏−𝟐)
t test for the correlation coefficient = 𝒕 =
√𝟏−𝒓𝟐
0.67975 × √(20 − 2)
𝒕=
√1 − (0.46206)
2.88395
𝒕=
0.73343
𝒕 = 𝟑. 𝟗𝟑
Here,
according to the requirement, we use the 0.05 level of significance. From the ‘t distribution
table’ we find the critical value for the given conditions ( one tailed test, 18 df, 0.05 level of
significance) to be 1.734.
Finally,
3.93 > 1.734
Calculated value > table value
Since,
The calculated value is higher than the table value. We have to reject the null hypothesis
and accept the alternate hypothesis.
Interpretation: We conclude that the independent variable is an aid in predicting the
dependent variable.
T-Distribution Table (One Tail):
References
One tailed test : Data Science Center. (2019, 10 28). Retrieved from Data Science Center :
https://www.statisticshowto.datasciencecentral.com/tables/t-distribution-table/