Beruflich Dokumente
Kultur Dokumente
S
N
A
H
IT
Prepared by
H
AT
S ATHITHAN
Assistant Professor
F
O
Department of of Mathematics
Faculty of Engineering and Technology
S
TE
SRM UNIVERSITY
Kattankulathur-603203, Kancheepuram District.
O
N
E
R
TU
C
LE
SRM UNIVERSITY
S
Two-way classification
N
A
? Introduction to Non-parametric test-Wilcoxon signed rank test (one sample test)
H
? Wilcoxon Mann-Whitney rank test (Two samples test)
IT
H
AT
F
O
S
TE
O
N
E
R
TU
C
LE
Page 1 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
1 Pearsons Correlation coefficient
Example: 1. Find the correlation co-efficient for the following data:
X 27 28 29 30 32 32 33
.
Y 17 18 19 19 21 20 21
S
with u = X 30 and v = Y 19
N
X Y u = X 30 v = Y 19 u2 v2 uv
A
27 17 -3 -2 9 4 6
H
28 18 -2 -1 4 1 2
IT
29 19 -1 0 1 0 0
30 19 0 0 0 0 0
H
32 21 2 2 4 4 4
AT
32 20 2 1 4 1 2
33 21 3 2 9 4 6
211 135 1 2 31 14 20
F
O
Now,
S
TE
7 20 1 2
N
= p p
{7 31 1} {7 14 4}
E
138 138
= = = 0.968.
R
216 94 142.49
TU
C
X 62 64 65 69 70 71 72 74
.
Y 126 125 139 145 165 152 180 208
Page 2 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
X Y u = X 69 v = Y 152 u2 v2 uv
62 126 -7 -26 49 676 182
64 125 -5 -27 25 729 135
65 139 -4 -13 16 169 52
69 145 0 -7 0 49 0
70 165 1 13 1 169 13
71 152 2 0 4 0 0
72 180 3 28 9 784 84
74 208 5 56 25 3136 280
-5 24 129 5712 746
S
Now,
N
N (uv) (u) (v)
rXY = rU V = q
A
q
2
N (v 2 ) ((v))2
2
N (u ) ((u))
H
8 746 (5) 24
IT
= p p
{8 129 25} {8 5712 (24)2 }
H
5968 + 120 AT
= p p
{1032 25} {45696 576}
6088 6088
= = = 0.903.
F
Example: 3. Find the Spearmans rank correlation co-efficient for the following data:
E
X 78 36 98 25 75 82 90 62 65 39
.
R
Y 84 51 91 60 68 62 86 58 63 47
TU
6d2
LE
XY = 1
n(n2 1)
Page 3 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
2
X Y Rx Ry d d
78 84 4 3 1 1
36 51 9 9 0 0
98 91 1 1 0 0
25 60 10 7 3 9
75 68 5 4 1 1
82 62 3 6 -3 9
90 86 2 2 0 0
62 58 7 8 -1 1
65 63 6 5 1 1
39 47 8 10 -2 4
S
26
N
Now,
A
H
6d2
XY = 1
IT
n(n2 1)
6 26
H
= 1 = 0.8424
10(99) AT
F
Example: 4. Find the Spearmans rank correlation co-efficient for the following data:
O
X 35 23 47 17 10 43 9 6 28
.
S
Y 30 33 45 23 8 49 12 4 11
TE
6d2
N
XY =1
n(n2 1)
E
R
X Y Rx Ry d d2
35 30 3 4 -1 1
C
23 33 5 3 2 4
LE
47 45 1 2 -1 1
17 23 6 5 1 1
10 8 7 8 -1 1
43 49 2 1 1 1
9 12 8 6 2 4
6 4 9 9 0 0
28 11 4 7 -3 9
22
Page 4 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
Now,
6d2
XY = 1
n(n2 1)
6 22
= 1 = 1 0.1833 = 0.8167
9(80)
3 Regression Analysis
S
N
Example: 5. Find the regression equations for the following data:
A
X 27 28 29 30 32 32 33
H
.
Y 17 18 19 19 21 20 21
IT
H
Solution: The two regression equations are given by
The regression line (equation) of y on x
AT
Cov(X, Y )
y y = (x x) = byx (x x)
F
x2
O
N (u2 ) ((u))2
TE
Cov(X, Y )
N
x x = (y y) = bxy (y y)
y2
E
with u = X 30 and v = Y 19
C
X Y u = X 30 v = Y 19 u2 v2 uv
LE
27 17 -3 -2 9 4 6
28 18 -2 -1 4 1 2
29 19 -1 0 1 0 0
30 19 0 0 0 0 0
32 21 2 2 4 4 4
32 20 2 1 4 1 2
33 21 3 2 9 4 6
211 135 1 2 31 14 20
Now,
Page 5 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
X 211 Y 135
x = = = 30.14, y = = = 19.286,
n 7 n 7
N (uv) (u) (v)
bxy = buv =
N (v 2 ) ((v))2
7 20 1 2
=
{7 14 4}
138
= = 1.468.
94
S
N
N (uv) (u) (v)
byx = bvu =
A
N (u2 ) ((u))2
H
7 20 1 2
=
IT
{7 31 1}
138
H
= = 0.6389.
216
AT
F
y y = byx (x x) = 0.6389(x x)
y 0.6389x = 19.286 0.6389 30.14 = 0.0296
O
N
x x = bxy (y y) = 1.468(y y)
R
Example: 6. The two regression equations are given by x + 0.87y = 19.13 and 0.50x + y =
LE
Page 6 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
4 Analysis of Variance (ANOVA)
Example: 7. The following data is about the mistakes made by 4 photographic laboratory
technicians in 5 successive days. Is there any significance difference in performance among the
technicians?
S
4 14 10 9
14 9 12 12 .
N
10 12 7 8
A
8 10 15 10
H
11 14 11 11
IT
H
Solution: AT
H0 : There is no significant difference between the technicians.
H1 : There is significant difference between the technicians.
F
O
X1 X2 X3 X4 Total
-4 4 0 -1 -1 16 16 0 1
O
4 -1 2 2 7 16 1 4 4
N
0 2 -3 -2 -3 0 4 9 4 .
-2 0 5 0 3 4 0 25 0
E
1 4 1 1 7 1 16 1 1
R
-1 9 5 0 13 37 37 39 10
TU
Step 1: N = 20
Step 2: T = 13 (Sum of all the values)
C
T2 (13)2
LE
T2
T SS = X12 + X22 + X32 + X42
N
= 37 + 37 + 39 + 10 8.45
= 114.55
Page 7 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
Step 5: SSC (Sum of Squares between samples(Columns))
S
N
ANOVA Table
A
H
Source Sum of d.f. Mean Variance Ra- Table Value
of Varia- Squares Squares tio (F ) (F )
IT
tion
H
Between SSC = c1 = M SC = Fc
AT = Ft (3, 16)
Columns 12.95 41 = SSC M SC at 1%
= =
3 c1 M SE LOS=5.29
12.95 6.35
F
= 4.31
3
O
4.31
.
Residual/ SSE = N c = M SE =
S
Error 101.6 204 = SSE
TE
=
16 N c
101.6
O
=
16
N
6.35
Total SSC +
E
SSE =
R
114.55
TU
C
Step 7: Conclusion: Since Calculated value of F is less than the table value of F . i.e. Fc < Ft .
LE
H0 is accepted.
Example: 8. A company appoints 4 salesmen A,B,C and D and observes their sales in 3
seasons. The figures in lakhs of Rs. are given below. Carry out the analysis of variance.
Salesmen
Season
A B C D
Summer 45 40 38 37 .
Winter 43 41 45 38
Monsoon 39 39 41 41
Page 8 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
Solution:
H0 : There is no significant difference between the sales of dealers and also between monthly
sales.
H1 : There is no significant difference between the sales of dealers and also between monthly
sales.
Dealers Seasons
Seasons
X1 X2 X3 X4 Total X12 X12 X12 X12
S
Y1 5 0 -2 -3 0 25 0 4 9
.
Y2 3 1 5 -2 7 9 1 25 4
N
Y3 -1 -1 1 1 0 1 1 1 1
A
Total 7 0 4 -4 7 35 2 30 14
H
Step 1: N = 12
IT
Step 2: T = 7 (Sum of all the values)
H
T2 (7)2
Step 3: Calculate = = 4.083 AT
N 12
Step 4: TSS (Total Sum of Squares)
T2
F
N
= 35 + 2 + 30 + 4 4.083
S
= 66.917
TE
= + + + 4.083
3 3 3 3
49 16 16
4.083 = 22.917
C
= +0+ +
3 3 3
LE
Page 9 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
ANOVA Table
S
r1 M SR LOS= 5.14
4.0835 1.462
N
Residual/ SSE = N (c + M SE
=
A
Error 35.833 r 1) = SSE
=
H
(c1)(r 6
5.972
IT
1) = 6
Total T SS =
H
66.917 AT
= N (c + r 1)
Step 8: Conclusion: Since Calculated value of F is less than the table value of F in both the
F
1. R(di ) is symmetry
2. R(di ) is mutually independent
C
This test
Can be applied to two types of sample: one sample or paired sample
For one sample, this method tests whether the sample could have been drawn from a
population having a hypothesized value as its median
For paired sample, to test whether the two populations from which these samples are
drawn identical.
Parameters used :-
di - difference of paired samples
Page 10 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
di - modular of difference of paired samples
R- ranks
R(di ) signed-rank
S
H0 medianR(d) = m0 medianR(d) = m0 medianR(d) = m0
H1 medianR(d) 6= m0 medianR(d) > m0 medianR(d) < m0
A
Rejection area min(T + , T ) a, T a T+ a
2
H
IT
Test Procedure
H
AT
For each of the observed values, find the difference between each value and the median,di =
xi m0 where m0 = median value that has been specified
F
Ignoring the observation where di = 0 , rank the |di | values so the smallest |di | will have
O
a rank of 1. Where two or more differences have the same value find their mean rank, and
use this.
S
TE
For observation where xi > m0 , list the rank as +R(di ) column and xi < m0 list the rank
as -R(di ) column
O
Then, sum the ranks of the positive differences, T + and sum the ranks of the negative
N
differences T .
E
R
X X
T+ = +R(di ), T = R(di )
TU
For a one tailed test where the H1 : median R(di ) > m0 the test statistic, W = T
For a one tailed test where the H1 : median R(di ) < m0 the test statistic, W = T +
Critical region: Compare the test statistic, W with the critical value in the tables; the null
hypothesis is rejected if W critical value, a
Make a decision.
Page 11 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
Example: 9. An environmental activist believes her communitys drinking water contains at
least the 40.0 parts per million (ppm) limit recommended by health officials for a certain metal.
In response to her claim, the health department samples and analyzes drinking water from a
sample of 11 households in the community. The results are as in the table below. At the 0.05
level of significance, by using Wilcoxon method can we conclude that the communitys drinking
water might at least 40.0 ppm recommended limit?
Household A B C D E F G H I J K
Observed
Concentration 39 20.2 40 32.2 30.5 26.5 42.1 45.6 42.1 29.9 40.9
Hints/Solution:
S
Here m0 = 40.
Household A B C D E F G H I J K
N
Observed
A
Concentration 39 20.2 40 32.2 30.5 26.5 42.1 45.6 42.1 29.9 40.9
H
di = xi m0 1 19.8 0 7.8 9.5 13.5 2.1 5.6 2.1 10.1 0.9
Total
IT
|di | 1 19.8 0 7.8 9.5 13.5 2.1 5.6 2.1 10.1 0.9
Rank R(di ) 2 10 6 7 9 3.5 5 3.5 8 1
H
+R(di ) AT 3.5 5 3.5 1 13
R(di ) 2 10 6 7 9 8 42
Step: 1
F
Null Hypothesis H0 : median of R(d) 40. i.e., communitys drinking water might at least
O
Step: 2 X
O
Step: 3
The table value (critical value) of Wilcoxon signed rank for one tail test at 5% LOS and n =
E
11 1 = 10 d.f. is Wt = 10.
R
TU
Step: 4
Since Wc = 13 > Wt = 10. We reject H0 .
C
LE
Page 12 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
+ Two independent random samples are required from each population. Let m1 and m2
be the random samples of sizes n1 and n2 where n1 n2 from population X and Y
respectively
S
cv = [a, b] = critical value with b = upper critical value a = lower critical value
N
Hypotheses to be tested are:
A
H0 : The distributions for populations 1 and 2 are identical
H
IT
H1 : The distributions for populations 1 and 2 are different (two tailed test)
H
H1 : The distributions for populations 1 and 2 lies to the left of that population 2 (left tailed
AT
test)
H1 : The distributions for populations 1 and 2 lies to the right of that population 2 (right tailed
F
test)
O
2. Test Statistic:
TE
i. Find T1 , the rank sum for the observations in sample 1 (Left tailed test)
O
3. Rejection region : Reject H0 ,if test statistic less than critical value. Critical value: get from
R
4. Conclusion.
C
LE
Page 13 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
Example: 10. Data below show the marks obtained by electrical engineering students in an
examination. Can we conclude the achievements of male and female students identical at
significance level = 0.1.
Gender Marks
Male 60
Male 62
Male 78
Male 83
Female 40
Female 65
Female 70
S
Female 88
Female 92
N
A
Hints/Solution:
H
IT
Gender Marks Rank
H
Male 60 2
Male 62 3
AT
Male 78 6
Male 83 7
F
Female 40 1
O
Female 65 4
Female 70 5
S
Female 88 8
TE
Female 92 9
O
Step: 1
N
Step: 2
R
X
Test statistic is given as follows: We haven1 = 4, n2 = 5, W = R1 = 2 + 3 + 6 + 7 = 18.
TU
Step: 3
C
The table value (critical value) of Wilcoxon rank sum test for two tail test at 10% LOS and
LE
n1 = 4, n2 = 5 d.f. is Wt = 13, 27. From the table of Wilcoxon rank sum test for = 0.1 (two
tail test), n1 = 4, n2 = 5, so critical value =13,27
Step: 4
Reject H0 if W
/ [a, b] or W
/ [13, 27]
Since 18 [13, 27], thus we fail to reject H0 and conclude that the achievements of male and
female are not significantly different.
Page 14 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
Practice/Exercise Problems
1. Find the Pearsons and Spearmans Correlation Co-efficient and two lines of regression
for the following data:
Sales 15 18 25 27 30 35
.
Expenditure 50 65 82 95 110 120
2. Find the Pearsons and Spearmans Correlation Co-efficient and two lines of regression
S
for the following data:
N
X 1 3 5 7 8 10
.
A
Y 8 12 15 17 18 20
H
3. Find the Pearsons and Spearmans Correlation Co-efficient and two lines of regression
IT
for the following data:
H
X 43 44 46 40 44 42 45 42 38 40 42 57
AT .
Y 29 31 19 18 19 27 27 29 41 30 26 10
4. Find the Pearsons and Spearmans Correlation Co-efficient and two lines of regression
F
M arksinM athematics
Marks in Statistics
S
47 52 57 62 67
TE
57 3 4 2
62 4 8 8 2 .
O
67 7 12 4 4
N
72 3 10 8 5
77 3 5 8
E
R
Y 35
with u = X 20 and v = Y 35 or
10
Ans: rXY = 0.63
5. Find the Pearsons and Spearmans Correlation Co-efficient and two lines of regression
for the following data:
Page 15 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
AgeX
Marks Y
18 19 20 21 T otal
10 20 4 2 2 8
20 30 5 4 6 4 19
30 40 6 8 10 11 35 .
40 50 4 4 6 8 22
50 60 2 4 4 10
60 70 2 3 1 6
T otal 19 22 31 28 100
Ans: rXY = 0.1897
S
6. Calculate the Pearsons correlation co-efficient and regression equations for the following
data:
N
AgeX
A
Marks Y
16 17 17 18 18 19 19 20
H
30 40 20 10 3 2
IT
40 50 4 28 6 4 .
50 60 5 11
H
60 70 2 AT
70 80 5
F
O
7. Calculate the Pearsons and Spearmans correlation co-efficient for the following data:
S
X 43 44 46 40 44 42 45 42 38 40 42 57
TE
.
Y 29 31 19 18 19 27 27 29 41 30 26 10
O
Sales 15 18 25 27 30 35
N
.
Expenditure 50 65 82 95 110 120
E
X 43 44 46 40 44 42 45 42 38 40 42 57
.
R
Y 29 31 19 18 19 27 27 29 41 30 26 10
TU
8. The competitors in a musical contest were ranked by the by 3 judges. Which pair of
judges have more or less the same taste in music?
C
S.N o. 1 2 3 4 5 6 7 8 9 10
LE
RankbyJudgeA 6 5 3 10 2 4 9 7 8 1
.
RankbyJudgeB 5 8 4 7 10 2 1 6 9 3
RankbyJudgeC 4 9 8 1 2 3 10 5 7 6
Ans: rAB = 0.0486,rBC = 0.2970 and rAC = 0.0424. A and B have same taste or
attitude towards ranking
9. Ten participants of a competition were ranked according to their performance by 3 judges.
Which pair of judges have nearly same attitude on ranking?
S.N o. 1 2 3 4 5 6 7 8 9 10
RankbyX 1 6 5 10 3 2 4 9 7 8
.
RankbyY 3 5 8 4 7 10 2 1 6 9
RankbyZ 6 4 9 8 1 2 3 10 5 7
Page 16 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
X 1 2 3 4 5 6 7
.
Y 9 8 10 12 11 13 14
11. The owner of a chain of ten stores wishes to forecast net profit with the help of next years
projected sales of food and non-food items. The date about current years sales of food
items, sale of non-food items as also net profit for all the ten stores are available as fol-
S
lows.
N
A
Supermarket No. 1 2 3 4 5 6 7 8 9 10
H
Net profit Y sales in cr Y 5.6 4.7 5.4 5.5 5.1 6.8 5.8 8.2 5.8 6.2
IT
Sales of food in crores X1 20 15 18 20 16 25 22 30 24 25
Sales of non food in cr X2 5 5 6 5 6 6 4 7 3 4
H
AT
Y = b0 + b1 X 1 + b2 X 2
Y = nb0 + b1 X1 + b2 X2
Y X1 = b0 X1 + b1 X12 + b2 X1 X2
S
TE
Y X2 = b0 X2 + b1 X1 X2 + b2 X22
O
12. The annual food expenditure of a family depends on the net income of the family and the
R
no of members in the family. A sample survey of 6 families are given below. Find the
TU
Ans)
Page 17 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
regression equation.
No. of credit cards 4 6 6 7 8 7 8 10
Family size 2 2 4 4 5 5 6 6
Family income in lakhs 14 16 14 17 18 21 17 25
14. Bivariate data often arises from the use of two different techniques to measure the same
quantity. As an example, the accompanying observations on x = hydrogen concentration
(ppm) using a gas chromatography method and y = concentration using a new sensor
method were given.
x 47 62 65 70 70 78 95 100 114 118
y 38 62 53 67 84 79 93 106 117 116
x 124 127 140 140 140 150 152 164 198 221
S
y 127 114 134 139 142 170 149 154 200 215
N
Find the Pearsons, Spearmans correlation coefficients and find the lines of regressions.
A
Construct a scatter-plot. Does there appear to be a very strong relationship between the
H
two types of concentration measurements? Do the two methods appear to be measuring
IT
roughly the same quantity? Explain your reasoning.
15. The accompanying data on y =ammonium concentration (mg/L) and x = transpiration
H
(ml/h) is given. Find the Pearsons, Spearmans correlation coefficients and find the lines
AT
of regressions. How would you describe the relationship between the variables, and does
simple linear regression appear to be an appropriate modeling strategy?
F
Find the Pearsons, Spearmans correlation coefficients and find the lines of regressions.
TU
a. Construct a scatterplot in which the axes intersect at (0, 0). Mark 0, 20, 40, 60, 80, and
100 on the horizontal axis and 0, 50, 100, 150, 200, and 250 on the vertical axis.
C
b. Construct a scatterplot in which the axes intersect at (55, 100), as was done in the cited
LE
article. Does this plot seem preferable to the one in part (a)? Explain your reasoning.
c. What do the plots of parts (a) and (b) suggest about the nature of the relationship
between the two variables?
1. The following table gives the number of refrigerators sold by 4 dealers in three months.
Is there any significant difference between the sales made by the dealers and sales made
Page 18 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
Dealer
Month
A B C D
by them in different months? I 50 40 48 39 .
II 46 48 50 45
III 39 44 40 39
2. There are three main brands of a certain powder. A set of 120 sample values are examined
and found to be allocated among four groups and three brands as shown below. Is there
Groups
Brands
A B C D
S
any significance difference in brands preference? I 0 4 8 15 .
II 5 8 13 6
N
III 8 19 11 13
A
H
IT
3. The following data represent the number of units of production per day truned out by 5
different workers using 4 different types of machines. (a) Test whether the five men differ
H
with respect to mean productivity. (b) Test whether the mean productivity is the same for
AT
the four different machine types.
W orkers
Machine Type
F
A B C D
O
I 44 38 47 36
S
II 46 40 52 43 .
TE
III 34 36 44 32
IV 43 38 46 33
O
V 38 42 49 39
N
E
4. Four doctors each test four treatments for a certain disease and observe the number of
R
days each patient takes to recover. The results are as follows (recovery time in days).
TU
Groups
Brands
1 2 3 4
C
A 10 14 19 20
Discuss the difference between the doctors and treatment. .
LE
B 11 15 17 21
C 9 12 16 19
D 8 13 17 20
5. The following data is on total Fe for four types of iron formation (1=carbonate, 2 =silicate,
3=magnetite, 4=hematite).
1: 20.5 28.1 27.8 27.0 28.0 25.2 25.3 27.1 20.5 31.3
2: 26.3 24.0 26.2 20.2 23.7 34.0 17.1 26.8 23.7 24.9
3: 29.5 34.0 27.5 29.4 27.9 26.2 29.9 29.5 30.0 35.6
4: 36.5 44.2 34.1 30.3 31.4 33.1 34.1 32.9 36.3 25.5
Carry out an analysis of variance F test at significance level 0.01, and summarize the
Page 19 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
results in an ANOVA table.
6. A consumer product-testing organization wished to compare the annual power consump-
tion for five different brands of dehumidifier. Because power consumption depends on
the prevailing humidity level, it was decided to monitor each brand at four different levels
ranging from moderate to heavy humidity (thus blocking on humidity level). Within each
level, brands were randomly assigned to the five selected locations. (a) Test whether the
five brands differ with respect to treatment. (b) Test whether the brands is the same for
Blocks (humidity level)
Treatment (brands)
A B C D
I 685 792 838 875
the four different humidity level. II 722 806 893 953 .
S
III 733 802 880 941
N
IV 811 888 952 1005
A
V 828 920 978 1023
H
IT
7. Four different coatings are being considered for corrosion protection of metal pipe. The
H
pipe will be buried in three different types of soil. To investigate whether the amount
AT
of corrosion depends either on the coating or on the type of soil, 12 pieces of pipe are
selected. Each piece is coated with one of the four coatings and buried in one of the three
types of soil for a fixed time, after which the amount of corrosion (depth of maximum
F
pits, in .0001 in.) is determined. The data appears in the table. (a) Test whether the soil
O
types differ with respect to treatment. (b) Test whether the soil types is the same for the
S
Soil Type
Coating
TE
A B C
I 64 49 50
four different coating. .
O
II 53 51 48
N
III 47 45 50
IV 51 43 52
E
R
TU
8. In an experiment to see whether the amount of coverage of light-blue interior latex paint
depends either on the brand of paint or on the brand of roller used, one gallon of each
C
of four brands of paint was applied using each of three brands of roller, resulting in the
LE
following data (number of square feet covered). (a) Test whether the Roller Brands dif-
fer with respect to treatment. (b) Test whether the Paint Brands differ with respect to
treatment. (c) Test whether the roller brands is the same for the different paint brands.
Soil Type
Coating
A B C
I 454 446 451
.
II 446 444 447
III 439 442 444
IV 444 437 443
9. The following data is given on the effort required of a subject to arise from four different
Page 20 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
types of stools (Borg scale). Perform an analysis of variance using a 5 .05, and follow
this with a multiple comparisons analysis if appropriate. (a) Test whether the Subjects
differ with respect to treatment. (b) Test whether the Types of Stools differ with respect
to treatment. (c) Test whether the Subject is the same for the different Types of Stools.
Subject
Type of Stool
1 2 3 4 5 6 7 8 9
I 12 10 7 7 8 9 8 7 9
.
II 15 14 14 11 11 11 12 11 13
III 12 13 13 10 8 11 12 8 10
IV 10 12 9 9 7 10 11 7 8
S
N
A
Problems based on Non-Parametric tests W ILCOXON AND M ANN W HIT-
H
NEY TESTS
IT
1. Student satisfaction surveys ask students to rate a particular course, on a scale of 1 (poor)
H
to 10 (excellent). In previous years the replies have been symmetrically distributed about
AT
a median of 4. This year there has been a much greater on-line element to the course, and
staff want to know how the rating of this version of the course compares with the previous
one. 14 students, randomly selected, were asked to rate the new version of the course and
F
O
2. The following data represent the number of hours that a rechargeable hedge trimmer
operates before a recharge is required: 1.5, 2.2, 0.9, 1.3, 2.0, 1.6, 1.8, 1.5, 2.0, 1.2, 1.7.
O
Use the Wilcoxon signed rank test to test the hypothesis, at the 0.05 level of significance,
N
that this particular trimmer operates a median of 1.8 hours before requiring a recharge.
3. The following data represent the time, in minutes, that a patient has to wait during 12
E
visits to a doctors office before being seen by the doctor: 17, 15, 20, 20, 32, 28, 12, 26,
R
25, 25, 35 and 24. Use the Wilcoxon signed rank test at the 0.05 level of significance to
TU
test the doctors claim that the median waiting time for her patients is not more than 20
minutes.
C
LE
4. Using high school records, Johnson High school administrators selected a random sam-
ple of four high school students who attended Garfield Junior High and another random
sample of five students who attended Mulbery Junior High. The ordinal class standings
for the nine students are listed in the table below. Test using Mann-Whitney test at 0.05
level of significance.
Garfield Junior High Mulbery Junior High
Student Class standing Student Class standing
Fields 8 Hart 70
Clark 52 Phipps 202
Jones 112 Kirwood 144
TIbbs 21 Abbott 175
Guest 146
Page 21 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
5. The effectiveness of advertising for two rival products (Brand X and Brand Y) was com-
pared. Market research at a local shopping centre was carried out, with the participants
being shown adverts for two rival brands of coffee, which they then rated on the over-
all likelihood of them buying the product (out of 10, with 10 being definitely going to
buy the product). Half of the participants gave ratings for one of the products, the other
half gave ratings for the other product. Test using Mann-Whitney test at 0.05 level of
significance.
Brand X Brand Y
Participant Rating Participant Rating
1 3 1 9
2 4 2 7
S
3 2 3 5
N
4 6 4 10
A
5 2 5 6
H
6 5 6 8
IT
6. The nicotine content of two brands of cigarettes, measured in milligrams, was found to
H
be as follows:
Brand-A 2.1 4.0 6.3 5.4 4.8
AT 3.7 6.1 3.3
Brand-B 4.1 0.6 3.1 2.5 4.0 6.2 1.6 2.2 1.9 5.4
F
Test the hypothesis using Mann-Whitney test, at the 0.05 level of significance, that the
O
median nicotine contents of the two brands are equal against the alternative that they are
unequal.
S
TE
7. To find out whether a new serum will arrest leukemia, nine patients, who have all reached
an advanced stage of the disease, are selected. Five patients receive the treatment and
O
four do not. The survival times, in years, from the time the experiment commenced are
N
Use the rank-sum test, at the 0.05 level of significance, to determine if the serum is effec-
TU
tive.
8. A fishing line is being manufactured by two processes. To determine if there is a differ-
C
ence in the mean breaking strength of the lines, 10 pieces manufactured by each process
LE
are selected and then tested for breaking strength. The results are as follows:
Process-1 10.4 9.8 11.5 10.0 9.9 9.6 10.9 11.8 9.3 10.7
Process-2 8.7 11.2 9.8 10.1 10.8 9.5 11.0 9.8 10.5 9.9
Use the rank-sum test with = 0.1 to determine if there is a difference between the mean
breaking strengths of the lines manufactured by the two processes.
9. The urinary fluoride concentration (parts per million) was measured both for a sample
of livestock grazing in an area previously exposed to fluoride pollution and for a similar
sample grazing in an unpolluted region:
Polluted 21.3 18.7 23.0 17.1 16.8 20.9 19.7
Unpolluted 14.2 18.3 17.2 18.4 20.0
Page 22 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
Does the data indicate strongly that the true average fluoride concentration for livestock
grazing in the polluted region is larger than for the unpolluted region? Use the Wilcoxon
rank-sum test at level a = 0.01.
10. A random sample of 15 automobile mechanics certified to work on a certain type of car
was selected, and the time (in minutes) necessary for each one to diagnose a particular
problem was determined, resulting in the following data: 30.6, 30.1, 15.6, 26.7, 27.1,
25.4, 35.0, 30.8, 31.9, 53.2, 12.5, 23.2, 8.8, 24.9, 30.2. Use the Wilcoxon test at signifi-
cance level 0.10 to decide whether the data suggests that true average diagnostic time is
less than 30 minutes.
TAKE MORE PROBLEMS IN THE TEXTBOOK AND SOME REFERENCE BOOKS FOR YOUR
S
PRACTICE .
N
Contact: (+91) 979 111 666 3 (or) athithan.s@ktr.srmuniv.ac.in
A
Visit: https://sites.google.com/site/lecturenotesofathithans/home
H
IT
H
AT
F
O
S
TE
O
N
E
R
TU
C
LE
Page 23 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
S
N
A
H
IT
H
AT
F
O
S
TE
O
N
E
R
TU
C
LE
Page 24 of 27 https://sites.google.com/site/lecturenotesofathithans/home
Probability & Statistics S.ATHITHAN
S
N
A
H
IT
H
AT
F
O
S
TE
O
N
E
R
TU
C
LE
Page 25 of 27 https://sites.google.com/site/lecturenotesofathithans/home
WILCOXON RANK SUM TEST
S
N
A
H
IT
H
AT
F
O
S
TE
S
N
A
H
IT
H
AT
F
O
S
TE
O
N
E
R
TU
C
LE