Sie sind auf Seite 1von 15

Assignment

On
Mathematical Applications on Correlation and Regression
(Math 51 & 54)

Prepared for,
Nusrat Khan
Assistant professor
Department of Finance
Faculty of business studies
University of Dhaka

Prepared by,
Group1 (Fab-5)
Department of Finance
Faculty of Business Studies
University of Dhaka

Course name:
Business Statistics
Course code: 107

Date of submission: September 30, 2019


Group profile:
Department of Finance
Faculty of Business Studies
University of Dhaka

Name ID no. Email Signature


Md. Zubaer 25-103 zubaer25103@gmail.com
Hossain
Md. KawserAlam 25-016 mdkawseralam121@gmail.com
Moumita Sarkar 25-172 moumitanny@gmail.com
Tanni
Sadia AlamPorna 25-122 sadiaporna5550@gmail.com
Md. Jahirul Islam 25-065 jishaimun@gmail.com
Answer to the question no. 51 (a)
Shipping Time(days) y
Shipment Distance(miles) x

1 656 5
2 853 14
3 646 6
4 783 11
5 610 8
6 841 10
7 785 9
8 639 9
9 762 10
10 762 9
11 862 7
12 679 5
13 835 13
14 607 3
15 665 8
16 647 7
17 685 10
18 720 8
19 652 6
20 828 10
Table 1:Random sample of 20 shipments of Bardi Trucking Co.

SCATTER DIAGRAM
15
Shipping Time (Days)

10

0
0 200 400 600 800 1000
Distance (Miles)
Figure 1: The Scatter Diagram of the Given sample
Scatter Diagram:
Interpretation: We can conclude that in the given relationship positive trend is obvious between the
Shipping distance (Miles) and Shipping time (days). Here, if the distance (Miles) increases, then the
shipping days also increases. The more the dots seem closer to the line, the higher is the positive
relationship. In this graph the dots are neither closer nor far away from the straight line. In a word the
independent variable has a positive but not complete influence over the dependent variable. Therefore,
we got to find the correlation coefficient of the variables to find out the specific and mathematical output.

Answer to the question no. 51 (b)


Correlation of Coefficient:
To find out the correlation of coefficient we have to, at first, find out the standard deviation of the data set

∑(𝑿−𝑿)2
Standard Deviation of Distance (Miles) =𝜹𝒙 = √ 𝑛−1

134871.9
=√
20 − 1

= 𝟖𝟕. 𝟒𝟓

∑(𝒚−𝒚")2
Standard Deviation of Distance (Miles) =𝜹𝒚 = √ 𝑛−1

136.24
=√
20 − 1

= 𝟐. 𝟕𝟎𝟐𝟖
Distance(miles) x Shipping Time(days) y
Mean 725.85 Mean 8.4
Standard Error 19.55468378 Standard Error 0.60437005
Median 702.5 Median 8.5
Mode 762 Mode 10
Standard Deviation 87.45120444 Standard Deviation 2.702825033
Sample Variance 7647.713158 Sample Variance 7.305263158
Kurtosis -1.502903868 Kurtosis 0.109087751
Skewness 0.256412239 Skewness 0.113020821
Range 255 Range 11
Minimum 607 Minimum 3
Maximum 862 Maximum 14
Sum 14517 Sum 168
Count 20 Count 20
Table 2: Standard Deviation, mean and all the descriptice statistics in excel
Shipping
Distance(miles)
Shipment Time(days) (X-X̅ ) (Y-Y̅ ) (X-X̅ )² (Y-Y̅ )² (X-X̅ )(Y-Y̅ )
x
y
1 656 5 -69.85 -3.4 4879.0225 11.56 237.49
2 853 14 127.15 5.6 16167.123 31.36 712.04
3 646 6 -79.85 -2.4 6376.0225 5.76 191.64
4 783 11 57.15 2.6 3266.1225 6.76 148.59
5 610 8 -115.85 -0.4 13421.223 0.16 46.34
6 841 10 115.15 1.6 13259.523 2.56 184.24
7 785 9 59.15 0.6 3498.7225 0.36 35.49
8 639 9 -86.85 0.6 7542.9225 0.36 -52.11
9 762 10 36.15 1.6 1306.8225 2.56 57.84
10 762 9 36.15 0.6 1306.8225 0.36 21.69
11 862 7 136.15 -1.4 18536.823 1.96 -190.61
12 679 5 -46.85 -3.4 2194.9225 11.56 159.29
13 835 13 109.15 4.6 11913.723 21.16 502.09
14 607 3 -118.85 -5.4 14125.323 29.16 641.79
15 665 8 -60.85 -0.4 3702.7225 0.16 24.34
16 647 7 -78.85 -1.4 6217.3225 1.96 110.39
17 685 10 -40.85 1.6 1668.7225 2.56 -65.36
18 720 8 -5.85 -0.4 34.2225 0.16 2.34
19 652 6 -73.85 -2.4 5453.8225 5.76 177.24
20 828 10 102.15 1.6 10434.623 2.56 163.44
Total 13689 158 134871.9 136.24 2944.76
Mean (X̅ )
& (Y̅ )
725.85 8.4
Table 3: Required table for Coefficient of Correlation from excel

∑(𝑿−𝑿)(𝒀−𝒀)
The Formula: Correlation Coefficient= r = (𝒏−𝟏).𝜹𝒙.𝜹𝒚

𝟐𝟗𝟒𝟒.𝟕𝟔
= (𝟐𝟎−𝟏)∗𝟖𝟕.𝟒𝟓𝟏∗𝟐.𝟕𝟎𝟐𝟖

= 𝟎. 𝟔𝟗𝟐𝟏𝟎𝟒𝟒𝟐𝟕

Correlation Of Coefficient Distance(miles) x Shipping Time(days) y

Distance (miles) x 1
Shipping Time (days) y 0.692104427 1
Table 4: Coefficient of Correlation in Excel application
Interpretation: At First it consists of positive correlation. So, we conclude that there lies a direct
positive relationship between the shipping distance (mile), the independent variable, and shipping Time
(Days), the dependent variable. We also have visualized this already in the scatter diagram. Now the
mathematical output is 0.69210 which is fairly higher than 0.5 and closer to 1. So the association is strongly
positive. While explaining the implementation, if the remittance the shipping distance (mile) increases,
the shipping Time (Days) will most likely increase along with that.

Test of Hypothesis:
First step is to find out the null and alternate hypothesis for the one tail test.
According to given condition -

H0 : ρ≤0 (The correlation the population negative or equal to zero)

H1: ρ>0 (The correlation in the population is positive)

Given,
No. of observations = n = 20
So, the degree of freedom = df = (20-2) = 18
Correlation of the equation = r = 0.6921

Coefficient of determination = r2 = 0.4790


Now,

𝒓×√(𝒏−𝟐)
t test for the correlation coefficient = 𝒕 =
√𝟏−𝒓𝟐

0.692104427 × √(20 − 2)
𝒕=
√1 − (0.479008538142174)
2.936
𝒕=
0.728
𝒕 = 𝟒. 𝟎𝟔𝟖𝟏
Here,
according to the question requirement, we use the 0.05 level of significance. From the ‘t
distribution table’ we find the critical value for the given conditions ( one tailed test, 18 df, 0.05
level of significance) to be 1.734.
Finally,
4.0681 > 1.734
Calculated value > table value
Since,

The calculated value is higher than the table value. We have to reject the null hypothesis
and accept the alternate hypothesis.

Interpretation: We conclude that the independent variable is an aid in predicting the


dependent variable. There lies a positive correlation between distance and time.

T-Distribution Table (One Tail):

DF A = 0.1 0.05 0.025 0.01 0.005 0.001 0.0005


∞ ta = 1.282 1.645 1.960 2.326 2.576 3.091 3.291
1 3.078 6.314 12.706 31.821 63.656 318.289 636.578
2 1.886 2.920 4.303 6.965 9.925 22.328 31.600
3 1.638 2.353 3.182 4.541 5.841 10.214 12.924
4 1.533 2.132 2.776 3.747 4.604 7.173 8.610
5 1.476 2.015 2.571 3.365 4.032 5.894 6.869
6 1.440 1.943 2.447 3.143 3.707 5.208 5.959
7 1.415 1.895 2.365 2.998 3.499 4.785 5.408
8 1.397 1.860 2.306 2.896 3.355 4.501 5.041
9 1.383 1.833 2.262 2.821 3.250 4.297 4.781
10 1.372 1.812 2.228 2.764 3.169 4.144 4.587
11 1.363 1.796 2.201 2.718 3.106 4.025 4.437
12 1.356 1.782 2.179 2.681 3.055 3.930 4.318
13 1.350 1.771 2.160 2.650 3.012 3.852 4.221
14 1.345 1.761 2.145 2.624 2.977 3.787 4.140
15 1.341 1.753 2.131 2.602 2.947 3.733 4.073
16 1.337 1.746 2.120 2.583 2.921 3.686 4.015
17 1.333 1.740 2.110 2.567 2.898 3.646 3.965
18 1.330 1.734 2.101 2.552 2.878 3.610 3.922
19 1.328 1.729 2.093 2.539 2.861 3.579 3.883
20 1.325 1.725 2.086 2.528 2.845 3.552 3.850
Table 5: One tailed T-Distribution Table

Answer to the question no. 51 (c)

Coefficient of Determination: = r2
= (0.692104427)2
= 0.479008538142174
Interpretation: From the perchantage or proportion we can conclude that 47.90% of the
dependent variable, shipping distance (miles), can be explained or accounted for or affected by or
influenced by the independent variable, Shipping time (days).

Answer to the question no. 51 (d)


Standard Error of Estimate :

To find the Standard error of estimate we have to, at first, find out the regression equation.

Regression Equation:

̂ = 𝒂 + 𝒃𝒙
𝒚

Step 1: Determine the slope of the equation.


Then following screenshots provide the necessary requirements for the slope. The Formula for slope
𝑺𝒚
Slope= b =( r × 𝑺𝒙) r : the correlation coefficient.

2.702825033 Sy : the standard deviation of Y (the


= (0.692104427)× 87.45120444
dependent variable).
= 0.021390 Sx : the standard deviation of X (the
independent variable).

Step 2: Find the Y-intercept (a)


It is called y-intercept which is in real the estimated value of y ( 𝑦̂), when the value of x is ZERO (O). The
picture given below will provide the details about the value of ‘a’. The formula—

Y -INTERCEPT = a = ( ̅y - bX̅ )
=[8.4 − {(0.021390) × 725.85}]

= -7.1263

Step 3: The regression equation-


̂ = 𝒂 + 𝒃𝒙
𝒚

=[−7.1263 + {(0.021390) ∗ 𝑥}]


=−𝟕. 𝟏𝟐𝟔𝟑 + 𝟎. 𝟎𝟐𝟏𝟑𝟗𝟎𝒙

Interpretation: This is the regression equation where if we place a random value of independent
variable X, distance of miles, then we will get the estimated value of y, the shipping time (days) in future.

Shipment Distance(miles) x
Shipping
Time(days) y
̂
𝒚 ̂)²
(𝒚 − 𝒚

1 656 5 6.905863844 3.632316993


2 853 14 11.11981979 8.295438054
3 646 6 6.691957451 0.478805114
4 783 11 9.622475036 1.897575025
5 610 8 5.921894436 4.318522734
6 841 10 10.86313212 0.74499705
7 785 9 9.665256315 0.442565965
8 639 9 6.542222976 6.040667899
9 762 10 9.173271611 0.683479829
10 762 9 9.173271611 0.030023051
11 862 7 11.31233554 18.59623782
12 679 5 7.397848548 5.749677661
13 835 13 10.73478828 5.131184135
14 607 3 5.857722518 8.166577992
15 665 8 7.098379598 0.812919349
16 647 7 6.713348091 0.082169317
17 685 10 7.526192384 6.119724119
18 720 8 8.27486476 0.075550636
19 652 6 6.820301287 0.672894202
20 828 10 10.58505381 0.342287955
Total 13689 158 157.4149462 71.97132695
Table 6: Required table for the SEE in Excel
So, the Standard Error of Estimate : SEE = 𝜹𝒚𝒙 =
∑(𝒚−𝒚
̂)²

𝒏−𝟐

71.97132695
𝛿𝑦𝑥 = √
20 − 2
𝜹𝒚𝒙 = 𝟏. 𝟗𝟗𝟗𝟔𝟎
Interpretation: Standard Error of Estimate (SEE) is a measure of dispersion or scatter of the
observed values around the regression line. It shows how inaccurate or perfect a regression equation
could be. Since, the standard error of estimate of this equation is smaller and it is 1.99960, the data are
relatively close to the regression line. Therefore, the prediction of the dependent variable, shipping days,
is more precisely predictable based on the independent variable, distance (miles). In a word, the
regression equation is more predictable.

Answer to the question no. 51 (e)


Yes, I do recommend the regression equation to predict the shipping time, the dependent variable.

Because,

I. Standard Error Estimate: Since, the standard error of estimate of this equation is smaller
and it is 1.99960, the data are relatively close to the regression line. Therefore, the
prediction of the dependent variable, shipping days, is more precisely predictable based
on the independent variable, distance (miles). In a word, the regression equation is more
predictable.
II. Coefficient of Determination : From the perchantage or proportion we can observe that
47.90%, nearly half of the variation, of the dependent variable, shipping distance (miles),
can be explained by or accounted for or affected by or influenced by the independent
variable, Shipping time (days).
Answer to the question no. 54 (a)
There were 20 states in the sample.
Given,
Total no. of degree of freedom = 19
Or, (no. of observations -1) = 19
Or, (n-1) = 19
Or, (no. of states) - 1 = 19
Or, No. of States = 19+1
So, No. of States = 20

Answer to the question no. 54 (b)


Standard Error of Estimate:
From a,
No. of States =20
Given, From the Anova Table,
Residual or Error of Sum of square = SSE = 12054

∑(𝑦−𝑦̂)²
So, the Standard Error of Estimate: SEE = 𝛿𝑦𝑥 =√
𝑛−2

𝑆𝑆𝐸
𝛿𝑦𝑥 = √
20 − 2

12054
𝛿𝑦𝑥 = √
20 − 2

𝜹𝒚𝒙 = 𝟐𝟓. 𝟖𝟕𝟕𝟗

Interpretation: Standard Error of Estimate (SEE) is a measure of dispersion or scatter of the


observed values around the regression line. It shows how inaccurate or perfect a regression equation
could be. Since, the standard error of estimate of this equation is much bigger and it is 25.8779, the data
are hugely scattered from the regression line. Therefore, the prediction of the dependent variable, no. of
construction war-zone fatalities, is not more precisely predictable based on the independent variable, no.
of unemployed people. In a word, the regression equation is less predictable.
Answer to the question no. 54 (c)
Coefficient of Determination:

Given, From the Anova Table,


Regression of Sum of square = SSR = 10354
Total of Sum of square = SST = 22408
𝑆𝑆𝑅 10354
So, the Coefficient of Determination: r2 = = = 𝟎. 𝟒𝟔𝟐𝟎𝟔
𝑆𝑆𝑇 22408
Interpretation: From the perchantage or proportion we can conclude that 46.20% of the
dependent variable, the no. of unemployed people, can be explained by or accounted for or affected by
or influenced by the independent variable, the no. of construction work-zone fatalities.

Answer to the question no. 54 (d)

Correlation of coefficient: r = √𝑟 2
= √0.46206
= 𝟎. 𝟔𝟕𝟗𝟕𝟓
Interpretation: At First it consists of positive correlation. So, we conclude that there lies a direct
positive relationship between the no. of construction fatalities, the independent variable, and the no.
unemployed people, the dependent variable. Now the mathematical output is 0.67975 which is fairly
higher than 0.5 and closer to 1. So the association is strongly positive. While explaining the
implementation, if the remittance the no. of construction fatalities increases, then the no. of unemployed
people will most likely increase along with that.

Answer to the question no. 54 (e)


Test of Hypothesis for Correlation Coefficient:
First step is to find out the null and alternate hypothesis for the one tail test.

H0 : ρ≤0 (The correlation the population negative or equal to zero)

H1: ρ>0 (The correlation in the population is positive)

Given,
No. of observations = n = 20
So, the degree of freedom = df = (20-2) = 18
Correlation of the equation = r = 0.6797

Coefficient of determination = r2 = 0.4620

𝒓×√(𝒏−𝟐)
t test for the correlation coefficient = 𝒕 =
√𝟏−𝒓𝟐

0.67975 × √(20 − 2)
𝒕=
√1 − (0.46206)
2.88395
𝒕=
0.73343
𝒕 = 𝟑. 𝟗𝟑
Here,

according to the requirement, we use the 0.05 level of significance. From the ‘t distribution
table’ we find the critical value for the given conditions ( one tailed test, 18 df, 0.05 level of
significance) to be 1.734.

Finally,
3.93 > 1.734
Calculated value > table value
Since,
The calculated value is higher than the table value. We have to reject the null hypothesis
and accept the alternate hypothesis.
Interpretation: We conclude that the independent variable is an aid in predicting the
dependent variable.
T-Distribution Table (One Tail):

DF A = 0.1 0.05 0.025 0.01 0.005 0.001 0.0005


∞ ta = 1.282 1.645 1.960 2.326 2.576 3.091 3.291
1 3.078 6.314 12.706 31.821 63.656 318.289 636.578
2 1.886 2.920 4.303 6.965 9.925 22.328 31.600
3 1.638 2.353 3.182 4.541 5.841 10.214 12.924
4 1.533 2.132 2.776 3.747 4.604 7.173 8.610
5 1.476 2.015 2.571 3.365 4.032 5.894 6.869
6 1.440 1.943 2.447 3.143 3.707 5.208 5.959
7 1.415 1.895 2.365 2.998 3.499 4.785 5.408
8 1.397 1.860 2.306 2.896 3.355 4.501 5.041
9 1.383 1.833 2.262 2.821 3.250 4.297 4.781
10 1.372 1.812 2.228 2.764 3.169 4.144 4.587
11 1.363 1.796 2.201 2.718 3.106 4.025 4.437
12 1.356 1.782 2.179 2.681 3.055 3.930 4.318
13 1.350 1.771 2.160 2.650 3.012 3.852 4.221
14 1.345 1.761 2.145 2.624 2.977 3.787 4.140
15 1.341 1.753 2.131 2.602 2.947 3.733 4.073
16 1.337 1.746 2.120 2.583 2.921 3.686 4.015
17 1.333 1.740 2.110 2.567 2.898 3.646 3.965
18 1.330 1.734 2.101 2.552 2.878 3.610 3.922
19 1.328 1.729 2.093 2.539 2.861 3.579 3.883
20 1.325 1.725 2.086 2.528 2.845 3.552 3.850
(One tailed test t- distribution : Data Science Central, 2019)

Table 7: T distribution table for one tailed test


Table of Figures
Figure 1: The Scatter Diagram of the Given sample--------------------------------------- 3

Table 1:Random sample of 20 shipments of Bardi Trucking Co. ----------------------- 3


Table 2: Standard Deviation, mean and all the descriptice statistics in excel ------ 4
Table 3: Required table for Coefficient of Correlation from excel -------------------- 5
Table 4: Coefficient of Correlation in Excel application ---------------------------------- 5
Table 5: One tailed T-Distribution Table ----------------------------------------------------- 7
Table 6: Required table for the SEE in Excel ------------------------------------------------ 9
Table 7: T distribution table for one tailed test ------------------------------------------ 14

References
One tailed test : Data Science Center. (2019, 10 28). Retrieved from Data Science Center :
https://www.statisticshowto.datasciencecentral.com/tables/t-distribution-table/

Das könnte Ihnen auch gefallen