Sie sind auf Seite 1von 126

15MA305 - STATISTICS FOR INFORMATION TECHNOLOGY

Name of the student :


Reg. Number :

Prepared by

Dr. N. BALAJI , M.Sc, M.Phil, MBA, Ph.D


Asst. Professor (SG)

FACULTY OF ENGINEERING AND TECHNOLOGY


Department of Mathematics
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY

For the students of 6th sem IT students


L T P C
15MA305 STATISTICS FOR INFORMATION TECHNOLOGY
4 0 0 4
Co-requisite: NA
Prerequisite: 15MA203 & 15MA207
Data Book /
Statistical Tables and control chart constant values to be provided.
Codes/Standards
Course Category B CORE MATHEMATICS
Course designed by DEPARTMENT OF MATHEMATICS
Approval -- Academic Council Meeting -- 2016

PURPOSE The purpose of this course is to make the students learn about the applications of statistical
tools and techniques in different field.
STUDENT
INSTRUCTIONAL OBJECTIVES
OUTCOMES
At the end of the course, student will be able
1. To gain knowledge in measures of central tendency and dispersion a e
2. To learn about methods of studying correlation and regression. a e
3. To have knowledge about analysis of time series a e
4. To gain knowledge about ANOVA a e
5. To understand the fundamentals of quality control and the methods used to a e
control systems and processes

C-
Session Description of Topic Contact IO Referenc
D-I-
hours s e
O
UNIT I: INTRODUCTION TO STATISTICS (numerical 12
problems only)
1. Introduction to uni-variate data 1 C, I 1 1-7
Measures of central tendency: Arithmetic mean, Median,
2. 2 C,I 1 1-7
Definition,Problems
M di Geometric
Mode, D fi i i Mean
P bl
and Harmonic Mean: Definition,
3. 2 C,I 1 1-7
Problems
4. Measures of dispersion: Range, Quartile deviation, Mean
2 C,I 1 1-7
deviation, Definition, Problems
5. Standard deviation and Co-efficient of variation: Definition,
2 C,I 1 1-7
Problems
6. Skewness, Definition, Problems 1 C,I 1 1-7
7. Kurtosis and Moments, Definition, Problems 2 C,I 1 1-7
K
UNIT II: CORRELATION AND REGRESSION 11
ANALYSIS

8. Introduction to Correlation analysis, Types of correlation 1 C,I 2 1-7

9. Methods of studying correlation - Karl Pearson’s coefficient


2 C,I 2 1-7
of correlation
10. Rank correlation method 2 C,I 2 1-7
11. Partial and Multiple Correlation 2 C,I 2 1-7
12. Introduction to Regression analysis – Regression lines 1 C,I 2 1-7
13. Properties of Regression coefficients, Problems 2 C,I 2 1-7

14. Angle between two regression lines. 1 C,I 2 1-7

UNIT III: ANALYSIS OF TIME SERIES 12

15. Components of time series – Problems of classifications –


1 C,I 3 1,3,4
Methods of measuring trends
16. Freehand graphing method, semi average method 2 C,I 3 1,3,4

17. moving average method 2 C,I 3 1,3,4

18. method of least squares 2 C,I 3 1,3,4

19. Introduction to Measurement of seasonal variation 1 C,I 3 1,3,4

20. Method of simple averages (weekly, monthly and quarterly) 2 C,I 3 1,3,4

21. Ratio to trend method 2 C,I 3 1,3,4

UNIT IV: ANALYSIS OF VARIANCE 13

22. Introduction to Small sample tests based on t and F


1 C,I 4 1-4
distribution
23. Test for single mean, difference between means, 2 C,I 4 1-4
24. Paired t-test, Test for equality of variances, 2 C,I 4 1-4

25. ANOVA- one -way classification 2 C,I 4 1-4


26. Two-way classification. 2 C,I 4 1-4

27. Non-Parametric Test: The Mann Whitney test, 2 C,I 4 1,3,6

28. The Kruskal-Wallis single-factor analysis of variance by


2 C,I 4 1,3,6
ranks, Procedure and problems
UNIT V: STATISTICAL QUALITY CONTROL 12
29. Introduction - Process control 1 C,I 5 1,3,4

30. control charts for variables - Mean and Range chart (X Bar
2 C,I 5 1,3,4
and R)
31. control charts for variables - Mean and Standard deviation
2 C,I 5 1,3,4
chart (X Bar and s)
32. Introduction to Attributes Control charts 1 C,I 5 1,3,4

33. Control chart for the number of defectives (np-chart) 2 C,I 5 1,3,4

34. Control chart for the fraction of defectives (p-chart) 2 C,I 5 1,3,4

35. Control chart for the number of defects (c-chart) 2 C,I 5 1,3,4

Total contact hours 60


LEARNING RESOURCES
Sl. No. TEXT BOOKS
1. C.Chatfield, “Statistics for Technology- A course in Applied Statistics”, Chapman and Hall, 2010.
REFERENCE BOOKS/OTHER READING MATERIAL
2. S.C.Gupta and V.K.Kapoor, “Fundamentals of Mathematical Statistics”, Sultan Chand and Sons,
New Delhi, 11th edition,2007.
3. S.P.Gupta,”Elements of business Statistics”, Sultan Chand and Sons, New Delhi, 1993.
4. S.C.Gupta and V.K.Kapoor, “Fundamentals of Applied Statistics”, Sultan Chand and Sons, New
Delhi, 2003.
5. R.S.N.Pillai, & V.Bagavathi, “Statistics – Theory and Practice”, Sultan Chand & Sons, 2009.
6. John E. Freund’s: Mathematical statistics with Application, Miller and Miller, Pearson Education,
2012.
7. V.K.Kapoor, “Statistic – Problems and Solutions”, 5th edition, Sultan Chand & Sons, 2007.

Course nature Theory


Assessment Method (Weightage 100%)
Assessment Cycle test Surprise
In- Cycle test I Cycle Test III Quiz Total
tool II Test
semester
Weightage 10% 15% 15% 5% 5% 50%
End semester examination Weightage 50%
Unit 1
Introduction to Statistics

1.1 Measures of central tendency


1.1.1 Mean
Problem 1 Find the arithmetic mean to the following data.
10, 20, 30, 40, 50.

Solution:
Given data x : 10, 20, 30, 40, 50.
1 P 1 150
∴ A.M.= x = [10 + 20 + 30 + 40 + 50] = = 30
N 5 5

Problem 2 Find the arithmetic mean to the following data.


x 20 30 40 50 60 70
f 8 12 20 10 6 4
Solution:
x f d=x−A fd
= x − 40
20 8 -20 -160
30 12 -10 -120
40 20 0 0
50 10 10 100
60 6 20 120
70 4 30 P 120
N =60 f d =60
P
fd
∴ x =A+
N
60
= 40 + = 40 + 1 = 41
60
Problem 3 From the following data compute arithmetic mean by i) direct method and ii) step deviation method.
Marks 0 − 10 10 − 20 20 − 30 30 − 40 40 − 50 50 − 60
No. of
5 10 25 30 20 10
students

Solution:

1
Unit I − Introduction to Statistics 2

mid. value of x−A


x f d= fm fd
x(m) h
x − 35
=
10
0 − 10 5 5 -3 25 -15
10 − 20 10 15 -2 150 -20
20 − 30 25 25 -1 625 -25
30 − 40 30 35 0 1050 0
40 − 50 20 45 1 900 20
50 − 60 10 55 2 550 P 20
N= P fd =
f m = 3300
100 −20

i) Direct method ii) Step deviation method


P
fd
P
fm ∴ x =A+h×
∴ x = N
N
3300 −20
= = 33 = 35 + 10 ×
100 100
= 35 − 2 = 33

1.1.2 Median

Problem 1 Obtain the median of the following data:


391, 384, 591, 407, 672, 522, 777, 753, 2488, 1490.

Solution:
First, we rearrange the data in ascending order
Sl.No. 1 2 3 4 5 6 7 8 9 10
Data 384 391 407 522 591 672 753 777 1490 2488
 th
N +1
∴ Median = the size of item
2
 th
10 + 1
= the size of item
2
= the size of 5.5th item
5th item + 6th item 591 + 672
= = = 631.5
2 2

Problem 2 From the following data find the value of median.


Income(Rs.): 1000 1500 800 2000 2500 1800
No. of Persons: 24 26 16 20 6 30

Solution:
STATISTICS FOR INFORMATION TECHNOLOGY 3

Income(Rs.) in
f c.f
ascending order

800 16 16

1000 24 40

1500 26 66

1800 30 96

2000 20 116

2500 6 122

N =122
Here N/2 = 61.
∴ the c.f. just greater than 61 is 66.
∴ Median = 1500.

Problem 3 Find out Median from the following data:


Wages per No. of Wages per No. of
week (Rs.) workers week(Rs.) workers
50 − 59 15 90 − 99 45
60 − 69 40 100 − 109 40
70 − 79 50 110 − 119 15
80 − 89 60
Solution:

Wages(Rs.) f c.f

49.5 − 59.5 15 15

59.5 − 69.5 40 55

69.5 − 79.5 50 105

79.5 − 89.5 60 165

89.5 − 99.5 45 210

99.5 − 109.5 40 250

109.5 − 119.5 15 265

Here N/2 = 132.5.

∴ the c.f. just greater than 132.5 is 165.

∴ the Median class is 79.5 − 89.5.

Here l = 79.5,f = 60 and preceding c.f = 105.


 
h N
∴ Median = l + − c.f
f 2
10
= 79.5 + (132.5 − 105)
60
= 79.5 + 4.58 = 84.08

Thus, median=Rs.84.08
Unit I − Introduction to Statistics 4

1.1.3 Mode
Problem 1 From the following data of the height of 100 persons in a commercial concern determine the modal
height:

x 58 60 61 62 63 64 65 66 68 70
f 4 6 5 10 20 22 24 6 2 1

Solution:
First, form the grouping table as follows:

x col. 1 col. 2 col. 3 col. 4 col. 5 col. 6


58 4
10
60 6 15
11
61 5 21
15
62 10 35
30
63 20 52
30
64 22 66
46
65 24 52
46
66 6 32
8
68 2 9
3
70 1

Analysis table:

col. 58 60 61 62 63 64 65 66 68 70
1 1
2 1 1
3 1 1
4 1 1 1
5 1 1 1
6 1 1 1
Total 1 3 5 4 1

Since the value 64 has occurred the maximum number of times. i.e., 5.
Thus the mode is 64 inches.

Problem 2 Calculate the mode for the following data:

No. of No. of
Marks Marks
students students
Above 0 80 Above 60 28
” 10 77 ” 70 16
” 20 72 ” 80 10
” 30 65 ” 90 8
” 40 55 ” 100 0
” 50 43

Solution: Since this is cumulative frequency distribution, we first convert into a simple frequency distribution.
5

No. of
Marks
students
0 − 10 3
10 − 20 5
20 − 30 7
30 − 40 10
40 − 50 12
50 − 60 15
60 − 70 12
70 − 80 6
80 − 90 2
90 − 100 8

By inspection the modal class is 50 − 60.

h( f1 − f0 )
M ode = l +
2 f1 − f0 − f2

Where
l =lower limit= 50
h =common width= 10
f1 =frequency of the modal class= 15
f0 =frequency of the preceding modal class= 12
f2 =frequency of the succeeding modal class= 12

h(f1 − f0 )
∴ M ode =l+
2f1 − f0 − f2
10(15 − 12)
= 50 +
2(15) − 12 − 12
30
= 50 +
6

= 50 + 5 = 55.

1.1.4 Geometric Mean


Problem 1 Monthly income of ten families of a particular place is given below. Find out Geometric Mean(G.M.).
85, 70, 15, 75, 500, 8, 45, 250, 40, 36.

Solution

x log x
85 1.9294
70 1.8451
15 1.1761
75 1.8751
500 2.6990
8 0.9031
45 1.0532
250 2.3979
40 1.6021
36 1.5563
17.6373
Unit I − Introduction to Statistics 6

X
∴ log x = 17.6373

n
P 
log xi
 i=1 
∴ G.M. = Antilog  
 n 

 
17.6373
= Antilog
10

= Antilog [1.76373] = 58.03

Problem 2 Compute the G.M from the following data.


Marks: 0 − 10 10 − 20 20 − 30 30 − 40 40 − 50
No. of
5 7 15 25 8
Students:

Solution
Marks Mid. x f f log m
(m)
0 − 10 5 5 3.4950
10 − 20 15 7 8.2327
20 − 30 25 15 20.9685
30 − 40 35 25 38.6025
40 − 50 45 8 13.2256
60 84.5243
n
P 
f log mi
 i=1 
∴ G.M. = Antilog  
 N 

 
84.5243
= Antilog
60

= Antilog [1.4087] = 25.63

1.1.5 Harmonic Mean


Problem 1 Find the Harmonic mean from the following data:
2574, 475, 75, 5, 0.8, 0.08, 0.005, 0.0009

Solution
1
x x
2574 0.0004
475 0.0021
75 0.0133
5 0.2000
0.8 1.2500
0.08 12.5000
0.005 200.0000
0.0009 1111.1111
1325.0769
Introduction to Statistics 7

N 8
∴ H.M. = P = = 0.006
1/x 1325.0769
Problem 2 From the following data compute the Harmonic mean:
Marks: 10 20 25 40 50
No. of
20 30 50 15 5
students:

Solution

x f f /x
10 20 2.000
20 30 1.500
25 50 2.000
40 15 0.375
50 5 0.100
120 5.975
N 120
∴ H.M. = P = = 20.08
(f /x) 5.975
Note: Relation among averages: A.M ≥ G.M. ≥ H.M.

Miscellaneous Problems

Problem 1 Calculate mean, median and mode from the following data:
x 10 − 20 20 − 30 30 − 40 40 − 50 50 − 60 60 − 70 70 − 80 80 − 90
f 4 12 40 41 27 13 9 4
Solution:

x − 45
C.I. m f d= fd c.f.
10
10 − 20 15 4 −3 −12 4
20 − 30 25 12 −2 −24 16
30 − 40 35 40 −1 −40 56
40 − 50 45 41 0 0 97
50 − 60 55 27 1 27 124
60 − 70 65 13 2 26 137
70 − 80 75 9 3 27 146
80 − 90 85 4 4 16 150
150 20
Mean:
P
fd
∴ x =A+h
N
20
= 45 + 10 × = 45 + 1.333 = 46.333
150
Median:
Here N/2 = 75, ∴ Median class = 40 − 50
 
h N
∴ Median = l + − c.f
f 2
10
= 40 + (75 − 56)
41
= 40 + 4.634 = 44.634
Unit I − Introduction to Statistics 8

Mode:
Since the highest frequency is 41, Mode lies in the class 40−50. Here l = 40, h = 10, f1 = 41, f0 = 40, f2 = 27

h(f1 − f0 )
M ode =l+
2f1 − f0 − f2
10(41 − 40)
= 40 +
2(41) − 40 − 27

= 40 + 0.67 = 40.67

Problem 2 Compute arithmetic mean, median and mode from the following data:
x Below 10 Below 20 Below 30 Below 40 Below 50 Below 60 Below 70 Below 80

f 5 19 48 69 94 115 125 132


x Below 90 Below 100

f 147 150
Solution:
This is a cumulative frequency distribution. Let us first convert it to a simple frequency distribution and then
calculate mean, median and mode.

x − 45
C.I. m f d= fd c.f.
10
0 − 10 5 5 −4 −20 5
10 − 20 15 14 −3 −42 19
20 − 30 25 29 −2 −58 48
30 − 40 35 21 −1 −21 69
40 − 50 45 25 0 0 94
50 − 60 55 21 1 21 115
60 − 70 65 10 2 20 125
70 − 80 75 7 3 21 132
80 − 90 85 15 4 60 147
90 − 100 95 3 5 15 150
150 −4
Mean:
P
fd
∴ x =A+h
N
4
= 45 − 10 × = 45 − 0.267 = 44.733
150
Median:
Here N/2 = 75, ∴ Median class = 40 − 50
 
h N
∴ Median =l+ − c.f
f 2
10
= 40 + (75 − 69)
25
= 40 + 2.4 = 42.4
STATISTICS FOR INFORMATION TECHNOLOGY 9

Mode:
Since it is an irregular distribution, we first, form the grouping table as follows:

x col. 1 col. 2 col. 3 col. 4 col. 5 col. 6


0 − 10 5
19
10 − 20 14 48
43
20 − 30 29 64
50
30 − 40 21 75
46
40 − 50 25 67
46
50 − 60 21 56
31
60 − 70 10 38
17
70 − 80 7 32
22
80 − 90 15 25
18
90 − 100 3

Analysis table:

col. 0 − 10 10 − 20 20 − 30 30 − 40 40 − 50 50 − 60 60 − 70 70 − 80 80 − 90 90 − 100
1 1
2 1 1
3 1 1
4 1 1 1
5 1 1 1
6 1 1 1
Total 1 4 5 3 1

∴ the modal class is 30 − 40.


Here l = 30, h = 10, f1 = 21, f0 = 29, f2 = 25

h(f1 − f0 )
M ode =l+
2f1 − f0 − f2
10(21 − 29)
= 30 +
2(21) − 29 − 25

= 30 + 6.67 = 36.67

1.2 Measures of dispersion


1.2.1 Range, Quartile deviation, Mean deviation and Standard deviation
Problem 1 The population in eighteen panchayat sammities of a district is as given below.
77, 76, 83, 68, 57, 107, 80, 75, 95, 100, 113, 119, 121, 83, 87, 46, 74
Find range and its coefficient.

Solution:
Range:

Range = G − S = 121 − 46
Coefficient of range:

G−S 121 − 46
Coefficient = = = 0.449
G+S 121 + 46
Problem 2 Compute the coefficient of quartile deviation (Q.D.) from the following data.
Unit I − Introduction to Statistics 10

Marks: 10 20 30 40 50 80
No. of
4 7 15 8 7 2
students:

Solution:
x f c.f.
10 4 4
20 7 11
30 15 26
40 8 34
50 7 41
80 2 43
43

To find Q1 :
Here Q1 =size of (N/4)th item=size of (10.75)th item= 20.
To find Q3 :
Here Q3 =size of (3N/4)th item=size of (32.25)rd item= 40.

Q3 − Q1
Q.D. =
2
40 − 20
=
2

= 10

and
Q3 − Q1
coefficient of Q.D. =
Q3 + Q1
40 − 20
=
40 + 20

= 0.333

Problem 3 Calculate the mean deviation from mean for the following series. Also find out its coefficient.

x 0 − 10 10 − 20 20 − 30 30 − 40 40 − 50
f 5 8 15 16 6

Solution:

m − 25
C.I. m f d= fd |D = m − x| f |D|
10
0 − 10 5 5 −2 −10 22 110
10 − 20 15 8 −1 −8 12 96
20 − 30 25 15 0 0 2 30
30 − 40 35 16 1 16 8 128
40 − 50 45 6 2 12 18 108
50 10 472

Mean:
P
fd
∴ x =A+h
N
10
= 25 + 10 × = 25 + 2 = 27
50
STATISTICS FOR INFORMATION TECHNOLOGY 11

M.D. from mean:


P
f |D|
∴ M.D. =
N
472
= = 9.44
50
and Coefficient of M.D. from mean:
M.D
∴ Coefficient of M.D. from mean =
x
9.44
= 0.35
=
27
Problem 4 Calculate the mean deviation from median for the following series. Also find out its coefficient.

x 0 − 10 10 − 20 20 − 30 30 − 40 40 − 50
f 5 8 15 16 6
Solution:
|D =
C.I. m f c.f. f |D|
m − M d.|
0 − 10 5 5 5 23 115
10 − 20 15 8 13 20 160
20 − 30 25 15 28 13 195
30 − 40 35 16 44 12 192
40 − 50 45 6 50 22 72
50 734
Median:
Here N/2 = 25, ∴ Median class = 20 − 30
 
h N
∴ Median=Md. =l+ − c.f
f 2
10
= 20 + (25 − 13)
15

= 20 + 8 = 28
M.D. from mean:
P
f |D|
∴ M.D. =
N
734
= = 14.68
50
and Coefficient of M.D. from median:
M.D
∴ Coefficient of M.D. from median =
M d.
14.68
= = 0.52
28
Problem 5 Find the standard deviation and coefficient of variation to the following data: 69,66,67,69,64,63,65,68,72
Solution:
X
xi = 69 + 66 + 67 + 69 + 64 + 63 + 65 + 68 + 72 = 603
P
xi 603
x= = = 67
n 9
Unit I − Introduction to Statistics 12

x (x − x)2
69 4
66 1
67 0
69 4
64 9
63 16
65 4
68 1
72 25
64

(x − x)2
P
64
∴ σ2 = = =8
N 8

∴ S.D. = σ = 8 = 2.8284
Coefficient of Variation:
σ 2.8284
C.V. = 100 × = 100 × = 4.2215
x 67

1.3 Moments, Skewness and Kurtosis


Problem 1 The first four moments of a distribution about the value 4 of the variable are −1.5, 17, −30 and 108.
Find the moment about mean, β1 , β2 and comment nature of the distibution.

Solution:
Given A = 4, µ′1 = −1.5, µ′2 = 17, µ′3 = −30 and µ′4 = 108.
∴ First four moments about mean is given by

µ1 = 0

µ2 = µ′2 − (µ′1 )2 = 17 − (−1.5)2 = 14.75

µ3 = µ′3 − 3µ′2 (µ′1 ) + 2(µ′1 )3 = −30 − (3 × 17 × −1.5) + (2 × (−1.5)3 ) = 39.75

µ4 = µ′4 −4µ′3 (µ′1 )+6µ′2 (µ′1 )2 −3(µ′1 )4 = 108−(4×(−30×−1.5))+6×(17×(−1.5)2 )−3×(−1.5)4 = 142.3125

µ23 (14.75)3
β1 = = = 0.4926
µ32 (39.75)2

µ2 (14.75)
β2 = = = 0.6543
µ24 (142.3125)2

and √
µ3
γ1 = β1 = 3/2 = 0.0589
µ2

Nature: Since γ1 > 0, the given distribution is positively skewed and since β2 < 3, the given distribution is
platykurtic.
Problem 2 Compute the Bowley’s coefficient of skewness from the following data.
Marks: 10 20 30 40 50 80
No. of
4 7 15 8 7 2
students:
STATISTICS FOR INFORMATION TECHNOLOGY 13

Solution:
x f c.f.
10 4 4
20 7 11
30 15 26
40 8 34
50 7 41
80 2 43
43

To find Median:

Here N/2 = 21.5, ∴ Median is M d. = 30

To find Q1 :

Here Q1 =size of (N/4)th item=size of (10.75)th item= 20.

To find Q3 :

Here Q3 =size of (3N/4)th item=size of (32.25)rd item= 40.

Q3 + Q1 − 2M d
Bowley’s coefficient of skewness: sk =
Q3 − Q1
40 + 20 − 2(30)
=
40 − 20

=0

UNIT-2 Karl Pearson’s Correlation Co-efficient


Correlation is the study of relationship between two independent variables.
Karl pearson’s correlation co-efficient is

cov(x, y)
r = r(x, y) = rxy =
σx σy

where, P
xy
cov(x, y) = −xy
n
rP
x2
σx = − (x)2
n
rP
y2
σy = − (y)2
n
n is the number of data
P
x
x=
n
P
y
y=
n
Unit I − Introduction to Statistics 14

Note:
1. Correlation co-efficient
P P −1P
between and 1. i.e., −1 ≤ r ≤ 1
N XY − ( X).( Y )
2. r = p P P p P P
N X 2 − ( X)2 N Y 2 − ( Y )2

Problem 1 Calculate the Karl pearson’s co-efficient of correlation to the following data.

x 65 66 67 67 68 69 70 72
.
y 67 68 65 68 72 72 69 71

Solution:

X Y X2 Y2 XY
65 67 4225 4489 4355
66 68 4356 4624 4488
67 65 4489 4225 4355
67 68 4489 4624 4556
.
68 72 4624 5184 4896
69 72 4761 5184 4968
70 69 4900 4761 4830
72 71 5184 5041 5112
544 552 37028 38132 37560
P P P
N XY − ( X).( Y )
r =p P P p P P
N X 2 − ( X)2 N Y 2 − ( Y )2

(8 × 37560) − (544).(552)
=p p = 0.6047
(8 × 37028) − (544)2 (8 × 38132) − (552)2

Rank correlation
Spearsman’s rank correlation coefficient
6 d2i
P
ρ=1−
n(n2 − 1)
Where, di = xi − yi
Note: If ranks are repeated, then
P 
6 d2i + C.F1 + C.F2 + · · ·
ρ=1−
n(n2 − 1)

Where, di = xi − yi

m(m2 − 1)
C.F’s are correction factor and it can be calculated by C.F = Here m is the number of times, the data
12
has been repeated.

Problem 1 Calculate the spearsman’s rank correlation to the following data.

x 68 64 75 50 64 80 75 40 55 64
.
y 62 58 68 45 81 60 68 48 50 70

Solution:
STATISTICS JFOR INFORMATION TECHNOLOGY 15

X Y Rank of X Rank of Y di = xi − yi d2i


68 62 4 5 −1 1
64 58 6 7 −1 1
75 68 2.5 3.5 −1 1
50 45 9 10 −1 1
64 81 6 1 −5 25
.
80 60 1 6 −5 25
75 68 2.5 3.5 −1 1
40 48 10 9 1 1
55 50 8 8 0 0
64 70 6 2 4 16
d2i = 72
P

In value of X,
2+3
75 is repeated 2 times and which having the rank as 2 and 3. ∴ the rank of 75 = = 2.5 and
2
m(m2 − 1) 2(22 − 1)
C.F1 = = = 0.5
12 12
5+6+7
64 is repeated 3 times and which having the rank as 5, 6 and 7. ∴ the rank of 64 = = 6 and
3
m(m2 − 1) 3(32 − 1)
C.F2 = = =2
12 12
In value of Y,
3+4
68 is repeated 2 times and which having the rank as 3 and 4. ∴ the rank of 68 = = 3.5 and
2
2 2
m(m − 1) 2(2 − 1)
C.F3 = = = 0.5
12 12
P 2 
6 di + C.F1 + C.F2 + C.F3
∴ ρ =1−
n(n2 − 1)
6 [72 + 0.5 + 2 + 0.5]
=1−
10(102 − 1)

= 1 − 0.4545

= 0.5454
Exercise
Problem 1 10 competitors in a musical contest were ranked by 3 judges x, y and z. Find out which pair of judges
having the same likings of music.
x 1 2 3 4 5 6 7 8 9 10
y 10 6 7 9 5 4 3 2 1 8 .
z 8 10 9 7 6 5 4 3 2 1
Ans.: ∵ ρzx is greater than the ρxy and ρyz x and z having the same likings of music.

Regression
Regression is the mathematical study of average relationship between the independent variables x and y. Lines of
regression of x on y
(x − x) = bxy (y − y)
Lines of regression of y on x
(y − y) = byx (x − x)
where bxy and byx are regression co-efficients. It is given by
P P
(x − x)(y − y) (x − x)(y − y)
bxy = P and byx = P
(y − y)2 (x − x)2
Unit 2 − CORRELATION AND REGRESSION ANALYSIS 16

Note: p
r= bxy byx

σx
bxy = r
σy

σy
byx = r
σx

The point of intersection of the lines of regression of y on x and x on y is the mean value of x and
y.
Problem 1 From the following data find
1. Two lines of regressions
2. Coefficient of correlation between the marks of economics and statistics
3. The most likely marks in statistics when the marks in economics is 30.
Marks in Economics 25 28 35 32 31 36 29 38 34 32
.
Marks in Statistics 43 46 49 41 36 32 31 30 33 39
Solution:Let x be marks in Economics and y be marks in Statistics
P P
x 320 y 380
x= = = 32 and y = = = 38
n 10 n 10
x y (x − x) (y − y) (x − x)2 (y − y)2 (x − x)(y − y)
25 43 −7 5 49 25 −35
28 46 −4 8 16 64 −32
35 49 3 11 9 121 33
32 41 0 3 0 9 0
31 36 −1 −2 1 4 2
.
36 32 4 −6 16 36 24
29 31 −3 −7 9 49 21
38 30 6 −8 36 64 −48
34 33 2 −5 4 25 −10
32 39 0 1 0 1 0
320 380 0 0 140 398 −93
P
(x − x)(y − y)
bxy = P
(y − y)2
−93
= = −0.2336
398
and P
(x − x)(y − y)
byx = P
(x − x)2
−93
= = −0.6642
140 p √
correlation co-efficient is = bxy byx = −0.2336 × −0.6642 = 0.393
Line of regression of x on y is (x − x) = bxy (y − y)
(x − 32) = −0.2336(y − 38)
x − 32 = −0.2336y + 8.8768
x = −0.2336y + 8.8768 + 32
x = −0.2336y + 40.8768 − − − −(1)
Line of regression of y on x is(y − y) = byx (x − x)
(y − 38) = −0.6642(x − 32)
y − 38 = −0.6642x + 21.2544
y = −0.6642x + 21.2544 + 38
y = −0.6642x + 59.2544 − − − −(2)
STATISTICS FOR INFORMATION TECHNOLOGY 17

Now, to find y when x = 30

eqn.(2) ⇒ y = −0.6642(30) + 59.2544 = 39.3284

∴ Marks in Statistics = 39.32

Problem 2 Two variables x and y have the regression lines 3x + 2y − 26 = 0, 6x + y − 31 = 0 find the

1. mean value of x and y

2. correlation co-efficient between x and y

3. the variance of y when the variance of x is 25

Solution:

Given 3x + 2y − 26 = 0 (1.1)
6x + y − 31 = 0 (1.2)

1. mean value of x and y


Solving (1) and (2), we get x = 4 and y = 7
∴ x = 4 and y = 7

2. correlation co-efficient between x and y


Let 3x + 2y − 26 = 0 be line of regression of x on y
Then
2
3x + 2y − 26 = 0 ⇒ 3x = −2y + 26 ⇒ x = − y + 12
3
2
∴ bxy = −
3
Let 6x + y − 31 = 0 be line of regression of y on x
Then
6x + y − 31 = 0 ⇒ y = −6x + 31 ⇒ y = −6x + 31

∴ byx = −6
r
p 2
r= bxy byx = − × −6 > 2
3
Since the correlation coefficient should not exceed 1, 3x + 2y − 26 = 0 can not be a line of regression of x on
y and 6x + y − 31 = 0 can not be a line of regression of y on x. ∴ we have to consider 3x + 2y − 26 = 0 be
line of regression of y on x

3
3x + 2y − 26 = 0 ⇒ 2y = −3x + 26 ⇒ y = − y + 13
2

3
∴ byx = −
2
and consider 6x + y − 31 = 0 be line of regression of x on y

1 31
6x + y − 31 = 0 ⇒ 6x = −y + 31 ⇒ x = − y +
6 6
1
∴ bxy = −
6
r
p 3 1
r= bxy byx = − × − = 0.5 < 1
2 6
Unit 2 − CORRELATION AND REGRESSION ANYLYSIS 18

3. the variance of y when the variance of x is 25 (σx2 = 25)


i.e., σx = 5, we have to find σy
σx
bxy = r
σy
σx
σy =r
bxy
5
= 0.5 = −15
1

6
σy2 = 225

−−−−−−−
UNIT-2
CORRELATION AND REGRESSION
Correlation and regression are concerned with the investigation of two variables(Association of two
variables).

We might want to know:


 If a relationship exists between those variables;
 if so, how strong that relationship is;
 what form that relationship takes.
 Can we make use of that relationship for predictive purposes i.e. forecasting?

Correlation describes the strength of the relationship. It is not concerned with 'cause' and 'effect'.
If there appears to be a linear relationship, it can be quantified. A correlation coefficient is calculated as
the measure of the strength of this relationship. Its symbol is 'r' and its value lies between -1 and +1.

The correlation coefficient is a number ranging from -1 to +1. A positive correlation means that as
values of one variable increase, values of the other variable also tend to increase. A small or zero
correlation coefficient tells us that the two variables are unrelated. Finally, a negative correlation
coefficient show an inverse relationship between the variable: as one goes up, the other goes down

Properties of the Correlation Coefficient

Due to the standardization that takes place in the formula, there are a couple of interesting properties of r :

1. 1  r  1
2. If the values of either variable are converted to a different scale, r will be the same.
3. If the variables x and y are interchanged, r will be the same.
4. The correlation coefficient r will only measure the strength of a linear relationship. It says nothing about other
kinds of relationships, like the temperature data on the previous page.

Hypothesis test for a Pearson’s correlation coefficient

H0: There is no association between ice-cream sales and average monthly temperature.
H1: There is an association between them.

Critical Value:
5%, 10 degrees of freedom = 0.576

Test statistic: 0.983

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 19


Conclusion: The test statistic exceeds the critical value so we reject the Null Hypothesis and
conclude that there is a significant association between ice-cream sales and average monthly
.
temperature

The standard deviations of X and Y is


( x  x ) 2 ( y  y) 2
SD  SD 
N N

Simple Correlation
N  XY   X Y
The simple sample correlation coefficient is r  or
N  X 2  (X ) 2 N  Y  (Y )
2 2

Cov( x, y )
r
Var ( x).Var ( y )
Cov( x, y )
r
x y

 XY  ( x ) ( y)
1
Cov( x, y ) 
N

X
1
x  2
 ( X ) 2
N

Y
1
x  2
 (Y ) 2
N

1.Calculate the coefficient of correlation from the following data (Directly X, Y also can be used)

Sales (X) 15 18 25 27 30 35

Expenditure(y) 50 65 82 95 110 120

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 20


X Y U=X-A(25) U2 V=Y-A(87) V2 UV

15 50

18 65

25 82

27 95

30 110

35 120

0 278 0 3560 985

N  XY   X Y N UV  U V 6(985)  0
r ,r    0.99
N  X 2  (X )2 N  Y  (Y )
2 2
N U 2  (U )2 N V  (V )
2 2
6(278) 6(3560)

2.Using Karl Pearsons coefficient of correlation form the following

X 43 44 46 40 44 42 45 42 38 40 42 57

Y 29 31 19 18 19 27 27 29 41 30 26 10

X Y U=X-A(40) U2 V=Y-A(27) V2 UV

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 21


43 29

44 31

46 19

40 18

44 19

42 27

45 27

42 29

38 41

40 30

42 26

57 10

523 306 43 407 -18 728 -373

N  UV   U V
The simple sample correlation coefficient is r  =0.732
N  U  (U )
2 2
N  V  (V )
2 2

Spearman's RANK Correlation Method

A nonparametric (distribution-free) rank statistic proposed by Spearman in 1904 as a measure of


the strength of the associations between two variables (Lehmann and D'Abrera 1998). The
Spearman rank correlation coefficient can be used to give an R-estimate, and is a measure of
monotone association that is used when the distribution of the data make Pearson's correlation
coefficient undesirable or misleading.

The Spearman rank correlation coefficient is defined by

(1)
Repeated rank Correlation R = 1 – [6Σd + 1/12 (m –m) + ……] / N -N] = 0.543
2 3 3

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 22


3.Example: 5 applicants for a job are rated by two officers, with the following results. Note that in this
example the ranks are given initially. Usually the data must be replaced by ranks.   .05 

Applicant A B C D E
Rater 1 4 1 3 2 5 Test to see how well the ratings agree.
Rater 2 3 2 5 1 4

H 0 :  s  0
In this case, we have a 1-sided test  . Arrange the data in columns.
H 1 :  s  0

Rater 1 Rater 2
Applicant d d2
rx ry
A 4 3 1 1
B 1 2 1 1 Note that  d  0 and  d 2
 8 . Since n  5,
C 3 5 2 4
D 2 1 1 1
E 5 4 1 1

 1
6 d  1  68  1  2  0.600 .
2

nn  1 55  1
rs 2 2
5

Repeated rank Correlation R = 1 – [6Σd2 + 1/12 (m3 –m) + ……] / N3-N] = 0.543

4.Using spearmens rank correlation find the following (Repeated Rank)

X 68 64 75 50 64 80 75 40 55 64

Y 62 58 68 45 81 60 68 48 50 70

R1 4 6 2.5 9 6 1 2.5 10 8 6

R2 5 7 3.5 10 1 6 3.5 9 8 2

D=|R1-R2|=|D2 1 1 1 1 25 25 1 1 0 16

R = 1 – [6Σd2 + 1/12 (m3 –m) + ……] / N3-N] = 0.54

5.Ten competitors in a beauty contest are ranked by the judges in the following data

Judge1 1 6 5 10 3 2 4 9 7 8 Total

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 23


Judge2 3 5 8 4 7 10 2 1 6 9

Judge3 6 4 9 8 1 2 3 10 5 7

D12 4 1 9 36 16 64 4 64 1 1 200

D22 9 1 1 16 36 64 1 81 1 4 214

D32 25 4 16 4 4 0 1 1 4 1 60

J1 and J2=R1 = -0.212

J2and J3 =R2 = -0.297 (negative approach)

J1 and J3 = R3 = 0.636 (J1 and J3 have the nearest approach)

Regression Equations r= byx bxy

The regression equation y on x

N  XY   X Y
Y –Y = bYX (X – X ) where b yx 
N  X 2  (X ) 2

The regression equation X on Y

N  XY   X Y
x- = bxy(y- ) bx y 
N  Y  (Y ) 2
2

6.The following data relates to the ages of husbands and wives

Age of husbands: 26 29 31 33 35 34 38 39 41 45
Age of wives : 22 26 27 31 38 19 29 36 35 46
Find regression equations, find the age of husband if wife 30 age ii)find wife’s age when
husbands age is 32

X Y U=X-A(35) U2 V=Y-A(30) V2 UV

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 24


26

29

31

33

35

34

38

39

41

45

351 309 299 9 593 328

Linear equation x on y

N  XY   X Y N UV  U V
x- = bxy(y- ) where bx y  bx y  =0.559
N  Y 2  (Y )2 N V 2  (V )2

X-35.1= 0.559(y-30.9)

Regression equation x on y is X = 0.559y+17.864

when y=30 then x= 36 years

Linear equation y on x

y- = byx (x- ) where byx = [nΣuv – ΣuΣv]/NΣU2 – (ΣV)2

N  XY   X Y ,
b yx  byx = 1.09
N  X  (X ) 2
2

Regression equation y on x is y = 1.09x-7.359

when wife age(y) is 30 then X is ::::: X = 0.559y+17.864

Linear equation X on Y
when husbands(X) age is 32 then y = 1.09X-7.359, y = 28 yrs

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 25


7.Calculate the correlation coefficient for the following . Also find regression equation y on x

Σx = 12500 ΣX 2= 1585000
ΣY = 8000 ΣY2 = 648000
ΣXY = 1007425 N= 100
N  XY   X Y
r
N  X 2  (X ) 2 N  Y  (Y )
2 2

r = 0.55

Linear equation y on x

y- = byx (x- ) where byx = [nΣuv – ΣuΣv]/NΣu2 – (Σu)2

N  XY   X  Y , byx =
b yx 
N  X 2  (X ) 2

Regression equation y on x is

8.Find the regression equation for the following data

Marks in maths:39 65 62 90 82 75 25 98 36 78

Marks statist: :47 53 58 86 62 68 60 91 51 84

X Y U=X-A(65) U2 V=Y-A(66) V2 UV

650 660 0 5398 0 2274 2704

byx  0.5, bxy  1.216, ,

x on y equation x = 1.216y-15.236

y on x equation y = 0.5x + 33.5

9.The regression equations 8x-10y+66 = 0 and 40x-18y = -214 Find the mean values of x and y
Find byx and bxy Find the coefficient of correlation [sqrt(bxy byx)]

10.Find the regression equations and also the coefficient of correlation from the following data

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 26


Σx = 580 ΣX 2= 41658
ΣY = 370 ΣY2 = 17206
ΣXY = 11444 N= 12

11.In a partially destroyed laboratory record of an analysis of correlation data, the following results
only are legible. Variance of X=1, The regression equations are 3X + 2Y =26 and 6x + Y =31, What
were i) the mean values of X and Y ii) the standard deviation of X and Y ? iii) the correlation of X andY

Mean is (4,7)
From 3X + 2Y =26 then bxy = -2/3 (assume x on y equation)
From 6x + Y =31 then byx = -6 both byx and bxy then r2 =4 ( Assumption wrong)

From 3X + 2Y =26 then byx = -3/2 (Assume y on x equation)


From 6x + Y =31 then bxy = -1/6 both byx and bxy then r2 =1/4(Assumption wrong)

 2 x bYX
  9 then  2 x  9 2 y
We know variance of X= 1 given  y bXY
2

hence  2 x  9(1),  x  3

12. Out of 2 lines of regression line given by x+2y-5=0 and 2x+3y-8 =0 find reg line x on y.
Also find mean , correlation, bxy, byx, eqn of x on y , eqn y on x [ (Ans.bxy=-2, byx = -2/3 )
mean(1,2)]
13. The equations of 2 regression lines are 3x+12y = 19 , 3y+9x =46. Obtain the correlation coefficient
and the mean vale of X and Y

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 27


MULTIPLE & PARTIAL CORRELATION
If the measure the degree of relation between the variable Y and one of the Variables X1 or X2
or X3 ......etc. then this measure is called partial correlation.

Partial correlation coefficient (Three variables)


It provides the measure of the relationship between the dependent variable `Y’ and any one of
the other variables X1,X2,.......Xn

If we denote r12.3 the partial differential coefficient between X1,X2 keeping X3 constant, then
Suppose we want to find the correlation between Y and X controlling W.
This is called the partial correlation and its symbol is r YX.W (If
x,y,z are three variables)

r12  r13 r23


r12..3 
1  r132 1  r232

r13  r12 r23


r13..2 
1  r122 1  r232

r23  r12 r13


r23..1 
1  r122 1  r132

1.Given r12 = 0.70, r13 = 0.61 , r23=0.40 Find i) r23.1 ii)r13.2 iii)r12.3

Ans. i) r23.1= - 0.048 ii)r13.2 =0.504 iii)r12.3=0.629

2.Is it possible to get the following from a set of experimental data the value of r12.3, If r23=0.8,
r13= - 0.5 , r12 = 0.6 Ans (r12.3 = 1.923)

3.From the data relating to the yield of dry back (X1), height(X2) and grown (X3) for 18
cinchona plants, the following correlation coefficients were obtained. (r12.3=0.62)

4.In a certain investigation, the following values are obtained r12=0.6, r13=-0.4 and r23=0.7. Are
these values consistent. (Find r12.3 if it is less than one consistent otherwise inconsistent) (Ans
r12.3 = 1.344)

5.The simple correlation coefficients between temperature (X1) corn yield (X2) and rain fall
(X3) are r12 = 0.59, r13 = 0.46 and r23= 0.77 Calculate the partial correlation coefficients
r12.3,r23.1 and r31.2 (Ans. R12.3=0.42, r23.1=0.69, r31.2=0.019)

Multiple Correlation (Definition)

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 28


If we measure the degree of relationship between the variable Y and all the variables
X1,X2,...Xn taken together then this measure is called multiple correlation.

Coefficient of Multiple Correlation :

The coefficient of multiple correlation is denoted by R1.23 (R1.32), R2.13(R2.31) and

R3.12 (R3.21)

r122  r132  2r12 r13 r23


R1.23 
1  r232

r122  r232  2r12 r13 r23


R2.13 
1  r132

r132  r232  2r12 r13 r23


R3.21 
1  r122

1.The following correlation coefficients are given : r12=0.98, r13=0.44 and r23=0.54 Calculate
multiple correlation coefficient treating first variable as dependent and second and third variables
are independent. (Ans. R1.23 =0.986)

r122  r132  2r12 r13 r23


R1.23 
1  r232

2.If r12 = 0.5, r31 = 0.3 , r23=0.45 find R3.12 Ans=0.46

r132  r232  2r12 r13 r23


R3.21 
1  r122

3.If r12 = 0.6, r13=0.7, r23 = 0.65, find R1.23, R3.12, R2.12 Ans(0.73, 0.76, 0.68)

4.If r12 = 0.8 , r13 = 0.5 , r23 = 0.3 find R1.23 (Ans R1.23 = 0.85)

r122  r132  2r12 r13 r23


R1.23 
1  r232

5.Given r12 = 0.77, r13 = 0.72, r23 = 0.52 calculate R1.23 (Ans.R1.23 0.86)

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 29


MULTIPLE REGRESSION

Multiple regression is a flexible method of data analysis that may be appropriate whenever a
quantitative variable (the dependent or criterion variable) is to be examined in relationship to any other
factors (expressed as independent or predictor variables). Relationships may be nonlinear, independent
variables may be quantitative or qualitative, and one can examine the effects of a single variable or
multiple variables with or without the effects of other variables taken into account

Multiple Regression with Two Predictor Variables


Multiple regression is an extension of simple linear regression in which more than one independent
variable (X) is used to predict a single dependent variable (Y). The predicted value of Y is a linear
transformation of the X variables such that the sum of squared deviations of the observed and predicted Y
is a minimum. The computations are more complex, however, because the interrelationships among all
the variables must be taken into account in the weights assigned to the variables. The interpretation of the
results of a multiple regression analysis is also more complex for the same reason.

With two independent variables the prediction of Y is expressed by the following equation:

Y'i = b0 + b1X1i + b2X2i

Note that this transformation is similar to the linear transformation of two variables discussed in the
previous chapter except that the w's have been replaced with b's and the X'i has been replaced with a Y'i.

The "b" values are called regression weights and are computed in a way that minimizes the sum of
squared deviations

Multiple Regression Models


These models are the most widely used of all regression methods. There are two or more predictor
variables that may be measurement or qualitative (dummy) variables. Some multiple regression models
may contain one measurement variable in multiple forms.
More often than not, the response variable is influenced by more than one predictor variable. For
example, its diameter, height, species, age, soil fertility, etc may affect timber volume or crown surface
of a tree. The crop yield may be affected by amount of irrigation as well as fertilizer
Let y be the dependent variable depending two independent variable X1 and X2. The regression

equation of y on X1 and X2.

The regression equation of y on X1 and X2 is given by


Y = b0+b1X1+b2X2
When b0, b1 and b2 are found by solving the normal equations
∑y = n b0 +b1∑X1 + b1∑X2

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 30


∑YX1 = b0∑X1 + b1∑X12 + b2∑X1X2
∑YX2 = b0∑X2 + b1∑X1X2 + b2∑X22

1.The owner of a chain of ten stores wishes to forecast net profit with the help of next years projected
sales of food and non-food items. The date about current years sales of food items, sale of non-food
items as also net profit for all the ten stores are available as follows.

Supermarket 1 2 3 4 5 6 7 8 9 10
No
Net profit 5.6 4.7 5.4 5.5 5.1 6.8 5.8 8.2 5.8 6.2
Y sales in cr Y
Sales of food 20 15 18 20 16 25 22 30 24 25
in crores X1
Sales of non 5 5 6 5 6 6 4 7 3 4
food in cr X2

y X1 X2 X1^ 2 X2^ 2 yx1 yx2 x1x2


5.6 20 5 400 25 112 28 100
4.7 15 5 225 25 70.5 23.5 75
5.4 18 6 324 36 97.2 32.4 108
5.5 20 5 400 25 110 27.5 100
5.1 16 6 256 36 81.6 30.6 96
6.8 25 6 625 36 170 40.8 150
5.8 22 4 484 16 127.6 23.2 88
8.2 30 7 900 49 246 57.4 210
5.8 24 3 576 9 139.2 17.4 72
6.2 25 4 625 16 155 24.8 100
59.1 215 51 4815 273 1309.1 305.6 1099

Y = b0+b1X1+b2X2
When b0, b1 and b2 are found by solving the normal equations
∑y = n b0 +b1∑X1 + b1∑X2
∑YX1 = b0∑X1 + b1∑X12 + b2∑X1X2
∑YX2 = b0∑X2 + b1∑X1X2 + b2∑X22

10b0 + 215 b1+51 b2 = 59.1


215b0 + 4815 b1+1099 b2 = 1309.1
51b0 + 1099 b1 + 272 b2 = 305.6

Answer b0 = 0.223, b1=0.196, b2= 0.287 , y = 0.233 + 0.196 x1 +0.287x2

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 31


2.The annual food expenditure of a family depends on the net income of the family and the no of
members in the family. A sample survey of 6 families

Food expenditure 10 12 14 15 10 11
in 1000s (y)
The net income 25 30 25 32 20 21
(X1)
No. of members 5 6 3 6 2 2
(X2)

Find the equation of multiple regression.

Ans) 6b0 + 153 b1+24 b2 = 72


153b0 + 654 b1+114 b2 = 1871
24b0 + 612 b1 + 114 b2 = 296
On solving b0 = -6.72 , b1 = 1.05 , b2 = -2.01
Regression equation is Y = -6.72 + 1.05 X1 – 2.01 X2
3.Given the following data, fit a regression equation representing dependence of number of credit cards
on family size and family income,. Also show whether addition of `Family Income’ variable has improve
the relationship by finding sums of squares of errors as also calculating simple and multiple correlation
coefficients. Fit and determine the multiple regression equation ?

Noof credit cards 4 6 6 7 8 7 8 10

Family size 2 2 4 4 5 5 6 6

Family income in 14 16 14 17 18 21 17 25
lakhs

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 32


Lecture Notes
15MA305-Statistics for Information Technology

S
N
A
H
IT
H
A
F
O
S

Department of of Mathematics
TE

Faculty of Engineering and Technology


O
N

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY


Kattankulathur-603203, Kancheepuram District.
E
R
TU

UNIT-3
C
LE

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY


Kattankulathur-603203, Kancheepuram District.
15MA305-Statistics for Information Technology
Unit-3
ANALYSIS OF TIME SERIES
U NIT-3 T OPICS :
⋆ Components of time series – Problems of classifications – Methods of measuring trends
⋆ Freehand graphing method, semi average method
⋆ Moving average method
⋆ Method of least squares
⋆ Introduction to Measurement of seasonal variation

S
⋆ Method of simple averages (weekly, monthly and quarterly)

N
A
⋆ Ratio to trend method

H
IT
H
AT
F
O
S
TE
O
N
E
R
TU
C
LE

Page 1 of 16
15MA305-Statistics for Information Technology
Contents
1 Concept of Time Series 2

2 Problems on Measuring Secular Trends 4

3 Problems on Measuring Seasonal Variation 13

4 Exercise/Practice/Assignment Problems 16

S
D EAR A LL , H ERE I HAVE SOLVED FEW PROBLEMS ONLY AND SOME TOPICS MAY BE

N
MISSED . P LEASE FOLLOW THE CLASSWORK TO HAVE ALL THE TOPICS FOR PREPARA -
TION . TAKE E XRECISE PROBLEMS GIVEN AT THE END FOR YOUR PRACTICE . A PART

A
FROM E XERCISE , YOU CAN FOLLOW ANY REFERENCE BOOK FOR YOUR PRACTICE .

H
IT
S OME OF THE SECTIONS / TOPICS IN THESE UNITS ARE PRELIMINARY IDEAS WHICH

H
ARE BASICS NEEDED TO DO OUR REGULAR COURSE EXAMPLES AND EXERCISES .
AT
F

1 Concept of Time Series


O
S
TE

Any data which is taken with its time of occurrence is called a time series . The five yearly out-
put of wheat recorded for the last fifteen years, the weekly average price of groceries recorded
O

for the last 10 weeks, the monthly average sales of any company recorded for the last 25 months
N

or the quarterly average profits recorded for the last 10 quarters etc., are examples of time series
data.
E

In the fields of business and economics and data such as income, imports, exports, production,
R

consumption, and prices are depends on time. Also these data were dependent on seasonal
TU

changes as well as regular cyclical changes over a time period. To evaluate the changes in
business and economics, the analysis of time series plays an important role in this regard. It is
C

necessary to associate the time with time series, because the time is one of the main and basic
LE

variable in time series analysis.

1.1 The Components of Time Series

The factors that are responsible for bringing about changes in a time series, also called the
components of time series, are as follows:
# Secular Trend (or General Trend)
# Seasonal Movement/Variation
# Cyclical Movement/Variation

Page 2 of 16
15MA305-Statistics for Information Technology
# Irregular Fluctuation/Variation

Secular Trend

The secular trend is the main component of a time series which results from long term effects
of socio-economic and political factors. This trend may show the growth or decline in a time
series over a long period. This is the type of tendency which continues to persist for a very long
period. Prices and export and import data, for example, reflect obviously increasing tendencies
over time.

S
Seasonal Trend

N
A
These are short term movements occurring in data due to seasonal factors. The short term is

H
generally considered as a period in which changes occur in a time series with variations in

IT
weather or festivities. For example, it is commonly observed that the consumption of ice-cream
during summer is generally high and hence an ice-cream dealer’s sales would be higher in

H
some months of the year while relatively lower during winter months. Employment, output,
AT
exports, etc., are subject to change due to variations in weather. Similarly, the sale of garments,
umbrellas, greeting cards and fire-works are subject to large variations during festivals like
F

Valentine’s Day, Eid, Christmas, New Year’s, etc. These types of variations in a time series are
O

isolated only when the series is provided biannually, quarterly or monthly.


S
TE

Cyclic Movement
O

These are long term oscillations occurring in a time series. These oscillations are mostly ob-
N

served in economics data and the periods of such oscillations are generally extended from five
to twelve years or more. These oscillations are associated with the well known business cy-
E
R

cles. These cyclic movements can be studied provided a long series of measurements, free from
irregular fluctuations, is available.
TU
C

Irregular Fluctuation
LE

These are sudden changes occurring in a time series which are unlikely to be repeated. They
are components of a time series which cannot be explained by trends, seasonal or cyclic move-
ments. These variations are sometimes called residual or random components. These variations,
though accidental in nature, can cause a continual change in the trends, seasonal and cyclical
oscillations during the forthcoming period. Floods, fires, earthquakes, revolutions, epidemics,
strikes etc., are the root causes of such irregularities.

Page 3 of 16
15MA305-Statistics for Information Technology
Methods of Analyzing Trend

A number of different methods are available to estimate the trend; however, the suitability of
these methods largely depends on the nature of the data and the purpose of the analysis. To
measure a trend which can be represented as a straight line or some type of smooth curve, the
following are the commonly employed methods:
(a) Freehand smooth curves
(b) Semi-average method
(c) Moving average method

S
(d) Mathematical curve fitting

N
A
H
IT
2 Problems on Measuring Secular Trends

H
2.1
AT
Illustrative Examples on Free-Hand, Semi-Average and Moving Av-
erages
F
O

E XAMPLE 2.1
Fit a trend line for the following data by freehand method.
S

Year x 1982 1983 1984 1985 1986 1987 1988 1989


TE

No. of failures f 23 26 28 32 20 12 12 10
O

Year x 1990 1991 1992 1993 1994 1995 1996 1997


No. of failures f 9 13 11 14 12 9 3 1
N
E

Hints/Solution:
R
TU
C

E XAMPLE 2.2
LE

Fit a trend line for the following data by semi-average method.


Year x 1991 1992 1993 1994 1995 1996 1997
Sales f 102 105 114 110 108 116 112

Hints/Solution:

Since seven year are given, the middle year can be left out and the average for first 3 years
321 336
(1991-1993) is = 107 and the last 3 years (1995-1997) is = 112. To draw the trend
3 3
line we use the points (1992,107) and (1996,112).

Page 4 of 16
15MA305-Statistics for Information Technology

S
N
A
H
IT
H
AT
Figure 2.1: Trend by Free Hand Method.
F
O

Note 2.1.1. If even number of years are given, one can use the first half and second half without
leaving any years for find the semi avarages.
S
TE
O
N
E
R
TU
C
LE

Figure 2.2: Trend by Semi-Average Method.

Page 5 of 16
15MA305-Statistics for Information Technology

E XAMPLE 2.3
Calculate the i-yearly (i = 3, 4, 5, 7) moving averages for the following data. Also plot
the actual and trend value on a graph.
Year x 1991 1992 1993 1994 1995 1996 1997
Sales f 102 105 114 110 108 116 112

Hints/Solution:

S
N
S.No. Y ear f 3 − ymt 3 − yma 4 − ymt 4 − yma 5 − ymt 5 − yma 7 − ymt 7 − yma
1 1991 102 0 0 0 0 0 0 0 0

A
2 1992 105 321 107 0 0 0 0 0 0
3 1993 114 329 109.67 431 107.75 539 107.8 0 0

H
4 1994 110 332 110.67 437 109.25 553 110.6 767 109.57

IT
5 1995 108 334 111.33 448 112 560 112 0 0
6 1996 116 336 112 0 0 0 0 0 0
7 1997 112 0 0 0 0 0 0 0 0

H
ymt-yearly moving totals yma-yearly moving average
AT
Note: In the 4-yma, one can find/modify the 4 yearly centered moving average for the better trend. Also som
F
O
S
TE
O
N
E
R
TU
C
LE

Figure 2.3: Trend by Moving Average Method.

Page 6 of 16
15MA305-Statistics for Information Technology
E XAMPLE 2.4
Calculate the i-yearly (i = 3, 4, 5, 7) moving averages for the following data. Also plot
the actual and trend value on a graph.
Year x 1982 1983 1984 1985 1986 1987 1988 1989
No. of failures f 23 26 28 32 20 12 12 10
Year x 1990 1991 1992 1993 1994 1995 1996 1997
No. of failures f 9 13 11 14 12 9 3 1

Hints/Solution:

S
S.No. Y ear f 3 − ymt 3 − yma 4 − ymt 4 − yma 5 − ymt 5 − yma 7 − ymt 7 − yma

N
1 1982 23 0 0 0 0 0 0 0 0
2 1983 26 77 25.67 0 0 0 0 0 0

A
3 1984 28 86 28.67 109 27.25 129 25.8 0 0
4 1985 32 80 26.67 106 26.5 118 23.6 153 21.86

H
5 1986 20 64 21.33 92 23 104 20.8 140 20

IT
6 1987 12 44 14.67 76 19 86 17.2 123 17.57
7 1988 12 34 11.33 54 13.5 63 12.6 108 15.43
8 1989 10 31 10.33 43 10.75 56 11.2 87 12.43

H
9 1990 9 32 10.67 44 11 AT 55 11 81 11.57
10 1991 13 33 11 43 10.75 57 11.4 81 11.57
11 1992 11 38 12.67 47 11.75 59 11.8 78 11.14
12 1993 14 37 12.33 50 12.5 59 11.8 71 10.14
13 1994 12 35 11.67 46 11.5 49 9.8 63 9
F

14 1995 9 24 8 38 9.5 39 7.8 0 0


15 1996 3 13 4.33 0 0 0 0 0 0
O

16 1997 1 0 0 0 0 0 0 0 0
ymt-yearly moving totals yma-yearly moving average
S
TE
O
N
E
R
TU
C
LE

Figure 2.4: Trend by Moving Average Method.

Page 7 of 16
15MA305-Statistics for Information Technology
E XAMPLE 2.5
Calculate the i-yearly (i = 3, 4, 5, 7) moving averages for the following data. Also plot
the actual and trend value on a graph.
Year x 1985 1986 1987 1988 1989 1990 1991 1992
y 90 110 185 200 195 210 300 450

Hints/Solution:

S.No. Y ear f 3 − ymt 3 − yma 4 − ymt 4 − yma 5 − ymt 5 − yma 7 − ymt 7 − yma
1 1985 90 − − − − − − − −

S
2 1986 110 385 128.33 − − − − − −
3 1987 185 495 165 585 146.25 780 156 − −

N
4 1988 200 580 193.33 690 172.5 900 180 1290 184.29
5 1989 195 605 201.67 790 197.5 1090 218 1650 235.71

A
6 1990 210 705 235 905 226.25 1355 271 − −
7 1991 300 960 320 − − − − − −

H
8 1992 450 − − − − − − − −

IT
ymt-yearly moving totals yma-yearly moving average

H
AT
F
O
S
TE
O
N
E
R
TU
C
LE

Figure 2.5: Trend by Moving Average Method.

2.2 Illustrative Examples on Least squares fit

Page 8 of 16
15MA305-Statistics for Information Technology
E XAMPLE 2.6
Fit a trend by straight line and a parabola, by the method of
least squares to the following data. Also find the short term
Y ear x 1991 1992 1993 1994 1995 1996 1997
fluctuations.
Sales f 102 105 114 110 108 116 112

Hints/Solution:

Year x y d d×y d2 d2 × y d3 d4 yl yc y − yl y − yc

1991 102 −3 −306 9 918 −27 81 104.642 102.5 2.642 0.5

S
1992 105 −2 −210 4 420 −8 16 106.285 106.285 1.285 1.285
1993 114 −1 −114 1 114 −1 1 107.928 109.214 −6.071 −4.785
1994 110 0 0 0 0 0 0 109.571 111.285 −0.428 1.285

N
1995 108 1 108 1 108 1 1 111.214 112.5 3.214 4.5
1996

A
116 2 232 4 464 8 16 112.857 112.857 −3.142 −3.142
1997 112 3 336 9 1008 27 81 114.5 112.357 2.5 0.357

H
Total 767 0 46 28 3032 0 196 767 767 ≅ 0 ≅ 0

IT
H
AT
F
O
S
TE
O
N
E
R
TU
C
LE

Figure 2.6: Trend by Least Squares Method.

Page 9 of 16
15MA305-Statistics for Information Technology
E XAMPLE 2.7
Fit a trend by straight line and a parabola, by the method of least squares to the following
data. Also find the short term fluctuations.
Y ear x 1982 1983 1984 1985 1986 1987 1988 1989
N o.of f ailures f 23 26 28 32 20 12 12 10
Y ear x 1990 1991 1992 1993 1994 1995 1996 1997
N o.of f ailures f 9 13 11 14 12 9 3 1

Hints/Solution:

S
Year x y d d×y d2 d2 × y d3 d4 yl yc y − yl y − yc

N
1982 23 −7 −161 49 1127 −343 2401 26.411765 27.716912 3.4117647 4.7169118
1983 26 −6 −156 36 936 −216 1296 24.848529 25.631618 −1.1514706 −0.3683824

A
1984 28 −5 −140 25 700 −125 625 23.285294 23.620903 −4.7147059 −4.3790966

H
1985 32 −4 −128 16 512 −64 256 21.722059 21.684769 −10.277941 −10.315231
1986 20 −3 −60 9 180 −27 81 20.158824 19.823214 0.1588235 −0.1767857

IT
1987 12 −2 −24 4 48 −8 16 18.595588 18.036239 6.5955882 6.0362395
1988 12 −1 −12 1 12 −1 1 17.032353 16.323845 5.0323529 4.3238445
1989 10 0 0 0 0 0 0 15.469118 14.686029 5.4691176 4.6860294

H
1990 9 1 9 1 9 1 1 13.905882
AT 13.122794 4.9058824 4.1227941
1991 13 2 26 4 52 8 16 12.342647 11.634139 −0.6573529 −1.3658613
1992 11 3 33 9 99 27 81 10.779412 10.220063 −0.2205882 −0.7799370
1993 14 4 56 16 224 64 256 9.2161765 8.8805672 −4.7838235 −5.1194328
1994 12 5 60 25 300 125 625 7.6529412 7.6156513 −4.3470588 −4.3843487
F

1995 9 6 54 36 324 216 1296 6.0897059 6.4253151 −2.9102941 −2.5746849


1996 3 7 21 49 147 343 2401 4.5264706 5.3095588 1.5264706 2.3095588
O

1997 1 8 8 64 64 512 4096 2.9632353 4.2683824 1.9632353 3.2683824


Total 235 8 −414 344 4734 512 13448 235 235 ≅0 ≅0
S
TE
O
N
E
R
TU
C
LE

Figure 2.7: Trend by Least Squares Method.

Page 10 of 16
15MA305-Statistics for Information Technology

E XAMPLE 2.8
Fit a trend by straight line and a parabola, by the method of
least squares to the following data. Also find the short term
Y ear x 1985 1986 1987 1988 1989 1990 1991 1992
fluctuations.
y 90 110 185 200 195 210 300 450

Hints/Solution:

S
Year x y d d×y d2 d2 × y d3 d4 yl yc y − yl y − yc
1985 90 −3 −270 9 810 −27 81 70 112.91667 −20 22.916667

N
1986 110 −2 −220 4 440 −8 16 112.14286 118.27381 2.1428571 8.2738095

A
1987 185 −1 −185 1 185 −1 1 154.28571 135.89286 −30.714286 −49.107143
1988 200 0 0 0 0 0 0 196.42857 165.77381 −3.5714286 −34.22619

H
1989 195 1 195 1 195 1 1 238.57143 207.91667 43.571429 12.916667
1990 210 2 420 4 840 8 16 280.71429 262.32143 70.714286 52.321429

IT
1991 300 3 900 9 2700 27 81 322.85714 328.9881 22.857143 28.988095
1992 450 4 1800 16 7200 64 256 365 407.91667 −85 −42.083333

H
Total 1740 4 2640 44 12370 64 452 1740 1740 ≅ 0 ≅ 0
AT
F
O
S
TE
O
N
E
R
TU
C
LE

Figure 2.8: Trend by Least Squares Method.

Page 11 of 16
15MA305-Statistics for Information Technology
E XAMPLE 2.9
Find a trend by line and a parabola, by the method of least squares to the following data.
x 1996 1997 1998 1999 2000 2001 2002
y 352 356 357 358 360 361 361

Hints/Solution: Let v = x − 1999 and v = y − 357 and let v = au2 + bu + c be the


best fit. The normal equations are given by

Σv = aΣu2 + bΣu + nc (2.1)


3 2
Σuv = aΣu + bΣu + cΣu (2.2)

S
Σu2 v = aΣu4 + bΣu3 + cΣu2 (2.3)

N
From the given data, Σu = 0, Σv = 6, Σu2 = 28, Σu3 = 0, Σu4 = 196, Σuv = 40

A
and Σu2 v = 6. Solving, we get

H
IT
v = 0.21429u2 − 1.4286u + 1.7143

H
. i.e. AT
y = 0.21429x + 829.445x − 802265.33
.
F
O
S
TE
O
N
E
R
TU
C
LE

Figure 2.9: Trend by Least Squares Method.

Page 12 of 16
15MA305-Statistics for Information Technology
3 Problems on Measuring Seasonal Variation

E XAMPLE 3.1
Consumption of rice in one of the village (in Kg) monthly-wise during 2004 to 2008
is given below. Find out the seasonal variation by the method of monthly averages.
YEAR JAN FEB MAR APR MAY JUNE JULY AUG SEP OCT NOV DEC
2004 318 281 278 250 231 216 223 245 269 302 325 347
2005 342 309 299 268 249 236 242 262 288 321 342 364
2006 367 328 320 287 269 251 259 284 309 345 367 394
2007 392 349 342 311 290 273 282 305 328 364 389 417
2008 420 378 370 334 314 296 305 330 356 396 422 452

S
Hints/Solution:

N
A
Months 2004 2005 2006 2007 2008 Total Average Percentage

H
JAN 318 342 367 392 420 1839 367.8 116.1351437

IT
FEB 281 309 328 349 378 1645 329 103.8838017
MAR 278 299 320 342 370 1609 321.8 101.6103568

H
APR 250 268 287 311 334 1450
AT 290 91.56930849
MAY 231 249 269 290 314 1353 270.6 85.44363751
JUNE 216 236 251 273 296 1272 254.4 80.32838649
JULY 223 242 259 282 305 1311 262.2 82.79128513
F

AUG 245 262 284 305 330 1426 285.2 90.05367856


O

SEP 269 288 309 328 356 1550 310 97.88443322


S

OCT 302 321 345 364 396 1728 345.6 109.1253552


TE

NOV 325 342 367 389 422 1845 369 116.5140512


DEC 347 364 394 417 452 1974 394.8 124.660562
O

Total 19002 3800.4 1200


N

Average 1583.5 316.7 100


E
R
TU
C

E XAMPLE 3.2
LE

Assuming that the trend is absent, determine if there is any seasonality in


the data given below. What are the seasonal indices for various quarters?
YEAR 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter
2011 3.7 4.1 3.3 3.5
2012 3.7 3.9 3.6 3.6
2013 4 4.1 3.3 3.1
2014 3.3 4.4 4 4

Hints/Solution:

Page 13 of 16
15MA305-Statistics for Information Technology
YEAR 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter
2011 3.7 4.1 3.3 3.5
2012 3.7 3.9 3.6 3.6
2013 4 4.1 3.3 3.1
2014 3.3 4.4 4 4
Total 14.7 16.5 14.2 14.2
Average 3.675 4.125 3.55 3.55
Seasonal Index 98.65771812 110.738255 95.30201342 95.30201342

Calculation of Seasonal Index (S.I.):

S
N
The average of all the averages= 14.9/4=3.725.

A
H
Quarterly Average 3.675
Seasonal Index for first quarter= × 100 = × 100 = 98.6577

IT
General Average 3.725

H
Similarly other quarter’s seasonal index were calculated and presented in the table.
AT
F
O

E XAMPLE 3.3
S

Find seasonal variation by the ratio-to-trend method for the following data.
TE

YEAR 1st-Quarter 2nd-Quarter 3rd-Quarter 4th-Quarter


O

2005 30 40 35 35
2006 34 52 50 44
N

2007 40 58 54 48
E

2008 54 76 68 62
R

2009 80 92 86 82
TU

Hints/Solution:
C

For determining the seasonal variation by ratio-to-trend method, we first determine the trend for
LE

the yearly data by least squares method and then we convert it to the quarterly data.
Year Yearly Total Yearly Avrg. y x = Y ear − 2007 xy x2 yl
2005 140 35 -2 - 70 4 32
2006 180 45 -1 - 45 1 44
2007 200 50 0 0 0 56
2008 260 65 1 65 1 68
2009 340 85 2 170 4 80
10035 1120 280 0 120 10 280

The straight line trend equation is calculated as y = ax + b = 12x + 56.


The quarterly increment is 12/4=3.

Page 14 of 16
15MA305-Statistics for Information Technology
Calculation of Quarterly Trend Values:

Consider the year 2005, trend value for the middle of the year (middle of all the quarters and
middle of 2nd and 3rd quarter) is 32. Quarterly increment is 3. So the trend value of 2nd quarter
is 32-3/2=30.5 (as the 2nd quarter is halfway distance from the middle and in the left) and trend
value of 3rd quarter is 32+3/2=33.5 (as the 3rd quarter is halfway distance from the middle
and in the right). The trend value for the first quarter is exactly 3 units distance from the left
of second quarter (since one quarter increment is 3) i.e. 30.5-3=27.5 and the trend value for
the 4th quarter is exactly 3 units distance from the right of the third quarter (since one quarter
increment is 3) i.e. 33.5+3=36.5. Similarly other year values were calculated and given as table.
The percentage of the trend values from actual to the calculated trend values also depicted and

S
given in another table.

N
A
Table 1: Quarterly Trend Values

H
YEAR 1st-Quarter 2nd-Quarter 3rd-Quarter 4th-Quarter

IT
2005 27.5 30.5 33.5 36.5
2006 39.5 42.5 45.5 48.5

H
2007 51.5 54.5 57.5 AT 60.5
2008 63.5 66.5 69.5 72.5
2009 75.5 78.5 81.5 84.5
F

Total 257.5 272.5 287.5 290.5


O

Average 51.5 54.5 57.5 60.5


S.I. 1382.550336 1463.087248 1543.624161 1624.161074
S
TE

Table 2: Quarterly Trend Values as % of Trend Values


O

YEAR 1st-Quarter 2nd-Quarter 3rd-Quarter 4th-Quarter


N

2005 109.0909091 131.147541 104.4776119 95.89041096


E

2006 86.07594937 122.3529412 109.8901099 90.72164948


R

2007 77.66990291 106.4220183 93.91304348 79.33884298


2008 85.03937008 114.2857143 97.84172662 85.51724138
TU

2009 105.9602649 117.1974522 105.5214724 97.04142012


Total 463.8363964 591.405667 511.6439643 448.5095649
C

Average 92.76727927 118.2811334 102.3287929 89.70191298


LE

S.I. Adjusted 92.05863068 117.3775847 101.5471039 89.01668071

In the percentage table, total of all the averages=403.079. Since the total is more than 400, an
adjustment is made by multiplying each average by 400/403.079 and then the final indices were
obtained.

Page 15 of 16
15MA305-Statistics for Information Technology
4 Exercise/Practice/Assignment Problems
1. Calculate 3,4,5,7 and 9-yearly moving average trend for the time series given below. Also
use the weights 2,1,3 to find 3 yearly weighted moving average, 2,1,2,2 to find 4 yearly
weighted moving average, 2,2,1,3,2 to find 5 yearly weighted moving average.
Y ear : 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
Quantity : 239 242 238 252 257 250 273 270 268 288 284
Y ear : 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Quantity : 282 300 303 298 313 317 309 329 333 327

2. Fit a line and curve trend for the following data. Also find the short time fluctuations.

S
Y ear : 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
Quantity : 239 242 238 252 257 250 273 270 268 288 284

N
Y ear : 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

A
Quantity : 282 300 303 298 313 317 309 329 333 327

H
IT
3. Find seasonal variation by simple average and the ratio-to-trend method for the follow-
YEAR 1st-Quarter 2nd-Quarter 3rd-Quarter 4th-Quarter

H
2011 39 21 AT 52 81
ing data. 2012 45 23 63 76
2013 44 26 69 75
2014 53 23 64 84
F
O

P RACTICE MORE PROBLEMS ON SOME OF THE REFERENCE BOOKS .


S
TE

Acknowledgement:
O

Some of the portions of this material are taken from the sources available from various sources.
N

I thank the authors for those who prepared the calculus books and related materials.
E

Contact: (+91) 979 111 666 3 (or) athithan.s@ktr.srmuniv.ac.in


R

Visit: https://sites.google.com/site/lecturenotesofathithans/home
TU
C
LE

Page 16 of 16
LEAST SQUARES BEST FIT FOR STRAIGHT LINE AND PARABOLA
Princples of least squares : Which gives a unique set of values to the constant for the best fit.

Fit a straight line of the form y = ax + b, the normal equations are

a ∑X + nb = ∑ Y
a ∑X2 + b ∑X = ∑ XY
Fit a parabol of the form y = aX2 + b X+ C , the normal equations are
a ∑X4 + b ∑X3+ C ∑X2 = ∑ X 2Y
a ∑X3 + b ∑X2+ C ∑X = ∑ X Y
a ∑X2 + b ∑X + nC= ∑Y
1. Fit a straight line to the data given below using Least squares method
x 0 1 2 3 4

y 1 1.8 3.3 4.5 6.3

2.The following determination of the specific heat of ethyl alcohol were made in an
investigation of the variation in specific heat with temperature:
Specific heat (y) 0.51 0.55 0.57 0.59 0.62 0.67

Temperature ( x deg) 0 10 20 30 40 50

Calculate the constants of the line y = a + bx that may provide a best fit to the data.

3.Using the principle of least squares , fit a curve of the form y  a x  b


x 5 10 15 20 25

y 15 19 23 26 30

[Ans ; 75a +5b = 114 , 1375 a + 75 b = 1885 , ans. y = x +12.3 ]


2 2
X Y X Y XY

∑X ∑Y = ∑X2 ∑Y2 ∑XY


4. Fit a parabola to the following data using the method of least squares.
x 1 2 3 4 5

y 2 3 5 8 10

X Y X2 X3 X4 X2Y XY

∑X ∑Y = ∑X2 ∑X3 ∑X4 ∑X2Y ∑XY

5.Given the following data


x 0 1 2 3 4

y 1 5 10 22 38

Find the straight line and the parabola of best fit.


UNIT-4 : SAMPLING
Sampling frequent terms
Null Hypothesis H0
Alternative Hypothesis H1
One tailed (based on H1 only if > or > come then onetailed or left ,right tail)
Two tailed (based on H1 only if = or not equal to give then two tailed)
Type1 error type2 error
Level of significance , degrees of freedom

Null Hypothesis :

A definite statement about the population parameter such a hypothesis give usually a hypothesis of no
difference and is denoted by H0

Alternative Hypothesis

Any hypothesis which is complementary to null hypothesis is called Alternative Hypothesis.

Standard Error

The standard deviation of sampling distribution of a statistics is known as it standard error. It is denoted
by SE

Errors in sampling
Type1 error : Reject H 0 when it is true.

Type2 error : Accept H1 when it is wrong(or false)

Two tailed test :

Suppose the population has a specified mean µ0

Null Hypothesis H0 : µ = µ0

Alternative Hypothesis µ≠µ0

One tailed test :

Null Hypothesis H0 : µ = µ0

Alternative Hypothesis µ>µ0 or µ<µ

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 1


Name of the test 1 % level 5% level

Two tailed µ≠µ0 |zα| =2.58 |zα| =1.96

Right tailed |zα| =2.33 |zα| =1.645

Left tailed |zα| =-2.33 |zα| =-1.645

Sampling

SMALL SAMPLES
a)student t test b)Chisqure test c)F test

x
t 
i) Test of significance of mean s
n 1
x
 t0.05 s s
s x  t0.05  m  x  t0.05
95% confidence limits or n 1 n 1
n 1
x1  x 2
ii) Test of significance of difference of mean t  1 1
S2(  )
n1 n 2

n1 s12  n2 s 22
Where (Big) S2 = n1  n2  2

When mean and sd is given Where small s1, s2 are sd of sample1, sample 2

 
2
xx
s2 
If data is discrete n
ONE TAILED 5 % X 2 = 10 0.10

5% LOS TWO TAILED 5% 0.05

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 2


1 % LOS One Tailed 1% X 2 0.02

Two tailed 1% 0.01

1. A sample of 26 bulbs gives a mean life of 990 hours with a SD of 20 hours. The manufacturer
claims that the mean life of bulbs is 1000 hours. Is the sample not upto the standard

Sample size n = 26

Sample mean x = 990, Population mean µ = 1000 , SD s = 20 , df =n-1=25 (5% level)

H0 = µ = 1000
H1 = µ < 1000
x
t 
Test of significance of single mean SD = -2.5
n 1

Tabulated value at 5 % level df (n-1)=25 calculated value

1.71 < 2.5

H0 Rejected

Result : The sample is not up to the standard

2. A machine is designed to produce insulating washers for electrical devices of average thickness
of 0.025 cm. A random sample of 10 washers was found to have a thickness of 0.024 cm with a
SD of 0.002 cm. Test the significance of the deviation. Value of t for 9 freedom at 5 % level is
2.262.

Sample mean x =0.024 cm, Population mean µ = 0.025cm n= 10

SD s = 0.002cm , df =n-1=9

H0 = µ = 0.025

H1 = µ ≠ 0.025

x
t 
Test of significance of single mean
SD = -1.5
n 1

Tabulated value at 5 % level calculated value

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 3


2.26 > 1.5

H0 Accepted

Result : difference between x and µ is significant

3. The mean weekly sales of soap bars in departmental stores is 145 bars per store. After an
advertising campaign, the mean weekly sales in 17 stores for typical week increased to 155 and
showed a standard deviation of 16. Was the advertising campaign successful?

S = 16 , n = 17, Population mean = 145, sample mean = 155, t = 2.5, H0 rejected

|t |= 2.5

4.Certain pesticide is packed into bags by a machine. A random sample of 10 bags drawn and their
contents are found to weigh in (in kg.) as follows.

50,49,52,44,45,48,46,45,49,45. Test if the average packing can be taken to be 50 kg.

X x-X(47.3) (x-X)2

50
49
52
44
45
48
46
45
49
45
473 64.1

Mean = 50+49+……….+45/10 = 47.3

S2 = 2
/ n = 64.1/10 = 6.41

S = 2.53

Null Hypothesis H0 : The average packing is 50 kgs µ = 50 kg

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 4


Alternative Hypothesis H1 : µ≠50

x
t 
Test of significance of single mean s = -3.19
n 1
Tabulated value at 5 % level calculated value

2.26 < 3.19

H0 Rejected, Result : The average packing is not 50 kg

5.The heights of 10 males of a given locality are found to be 70, 67, 62, 68, 61, 68, 70, 64, 64, 66
inches . It is reasonable to believe that the average height is greater than 64 inches . Test at 5 %
significance level

X x-X(66) (x-X)2

70
67
62
68
61
68
70
64
64
66
660 90

Mean = 70+67+……….+66/10 = 66

S2 = 2
/ n = 90/9 = 9

S =3

Null Hypothesis H0 : µ = 64 kg

Alternative Hypothesis H1 : µ>64

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 5


x
t 
Test of significance of single mean s =2
n 1
Tabulated value at 5 % level calculated value
1.833 < 2
H0 Rejected, Result : The average height is 64 inches
STUDENTS t TEST FOR DIFFERENCE OF MEANS (The samples are independent)
1.The average number of articles produced by two machines per day are 200 and 250 with standard
deviations 20 and 25 respectively on the basis of records of 25 days production. Can you regard both
the machines equally efficient at 1 % level of significance.

N1 = 25 n2 = 25
X1 = 200 x2 = 250
S1 =20 S2 = 25
n1 s12  n2 s 22
S2 = n1  n2  2 (when s1 and s2 is given)

 ( x  x )2  ( y  y) 2

When s2 = n1  n2  2

n1 s12  n2 s 22
S2 = n1  n2  2 =
533.85

S =23.10

Null Hypothesis H0 ; Bothe the machines are equally efficient µ1 = µ2

H1 : µ1 ≠ µ2

x1  x 2
Test of significance of difference of mean t  1 1 = -7.65
S (  )
2

n1 n 2

Tabulated value at 1 % level calculated value

Df n1+n2-2 =48 is 2.58 < 7.65

H0 Rejected, Result : Bothe the machines are not equally efficient at 1% level of significance

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 6


2.Below are given the gain in weights in (lbs) of pig fed on two diets A and B

Diet A 25 32 30 34 24 14 32 24 30 31 35 25

Diet B 44 34 22 10 47 31 40 30 32 35 18 21 35 29 22

Test if the two diets differ significantly as regards their effect on increase in weight

Null hypothesis H0 : µ1 = µ2 . There is no significant difference

between the mean increase in weight due to diets Al and B

Alternative Hypothesis H1 : µ1 ≠ µ2

X x-A(28) (X-x)2 Y Y-A(30) (Y-y )2

25 44

32 34

30 22

34 10

24 47

14 31

32 40

24 30

30 32

31 35

35 18

25 21

35

29

22

336 380 450 1410

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 7


 ( x  x )2  ( y  y) 2

S2 = n1  n2  2
= 71.6

(OR)

n1 s12  n2 s 22
d   d1  d  d2 
2 2 2 2
2
S = n1  n2  2 WHERE, S 1
2

n1
1
 
 n1 
, S2 2 
n2
2
 
 n2 
,
   

x1  x 2
Test of significance of difference of mean t  1 1 = - 0.61
S (  )
2

n1 n 2

Tabulated value at 5 % level calculated value


Df n1+n2-2 =25 is 2.06 > 0.61
H0 Accepted
Result : There is no significant difference between the mean increase in
weight due to Diet A and Diet B
3.The horses A and B were tested according to the time (in seconds ) to run a particular track with the
following results.

Horse A 28 30 32 33 33 29 34

Horse B 29 30 30 24 27 27 -

Test whether you can discriminate between two horses. You can use the fact that 5 % value

Null hypothesis H0 : µ1 = µ2 . There is no significant difference two differences


Alternative Hypothesis H1 : µ1 ≠ µ2
X x-X(31.3) (X-x)2 Y Y-y(27.8) (Y-y )2

219 31.41 167 26.84

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 8


 ( x  x )2  ( y  y) 2

S2 = n1  n2  2
= 5.29,

n1 s12  n2 s 22
d   d1  d  d2 
2 2 2 2

S =2
n1  n2  2 WHERE, S 1
2

n1
1
 
 n1 
, S2 2 
n2
2
 
 n2 
= 5.29
   
x1  x 2
Test of significance of difference of mean t  = 2.70
1 1
S2(  )
n1 n 2

Tabulated value at 5 % level calculated value


Df n1+n2-2 =11 is 2.2 < 2.73
H0 Rejected Result : There is significant difference between two horses and they can be
discriminated
PAIRED t-test (i.e The samples are dependent )
Note : If the pairs of values of X1 and X2 are associated in some way and n1=n2=n
Deviation = X1 – X2

d
t where [ dbar = mean value of the differences]
SD / n  1

1. The following data related to the marks obtained for 11 students in 2 tests 1 held at
beginning of the year and the other at the end of the year, after intensive coaching. Do
the data indicate that the students have benefited by coaching
Test1 : 19 23 16 24 17 18 20 18 21 19 20

Test2 : 17 24 20 24 20 22 20 20 18 22 19

The given data are associated with each other

Test1(x) Total
215

Test2(y) 226

d=x-y 2 -1 -4 0 -3 -4 0 -2 3 -3 1 -11

d2 69

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 9


d  d 
2 2

S     = 2.296
n  n 

H0 : X1 = X2

H1 : X1< X2

d(bar) = ∑d/n = -11/11 = -1

d 1
t = = 1.38
SD / n  1 2.296 / 11  1

Calculated < Table value(1.81)

H0 Accepted, Result : X1=X2 Coaching is not effective

2.To verify whether a course in accounting improved performance, a similar test was given to 12
participant both before and after the course. The marks are
Before : 44 40 61 52 32 44 70 41 67 72 53 72

After : 53 38 69 57 46 39 73 48 73 74 60 78

Whether the course useful

Null hypothesis H0 : µ1 = µ2 . There is no significant difference in the performance

Test1(x) 72 Total

Test2(y) 78

d=x-y -6

d2 36 578

d  d 
2 2
d
S     =4.81 , t  = 3.44 (Table value 1.8)
n  n  SD / n  1

3.A company is testing two machines. A random sample of 8 employees is selected and each
employee uses each machine for one hour. The number of components produced is shown
in the following table.

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 10


Employee 1 2 3 4 5 6 7 8 Total

Machine1 99 107 84 99 102 87 93 101

Machine2 99 112 90 97 108 97 94 98

-23

d2 211

d  d 
2 2

S     = 4.225
n  n 

H0 : , X1 = X2
H1 : X1< X2
d(bar) = ∑d/n = -23/8 = -2.873

d 2.875
t = = 0.255
SD / n  1 4.255 / 8  1
Calculated(1.787) < Table value at 5% level (1.90)

H0 Accepted, Result : X1=X2 there is no evidence of difference b/w the machines in the mean
number of components produced.

4.Acompany arranged an intensive training course for its team of salesmen. A random sample
of 10 sales men was selected and the value ( in `000) of their sales made in the weeks
immediately before and after the course are shown in the following data.

Salesman 1 2 3 4 5 6 7 8 9 10 Total

Sales 12 23 5 18 10 21 19 15 8 14
before

Salesafter 18 22 15 21 13 22 17 19 12 16

-30

d2 196

Dbar = 3, s = 3.26, t =2.76, H0 rejected, there is no evidence of increase in sales after training.
|t|= -2.764

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 11


F- distribution [ F- test value is always greater than 1]

(In this generally n1 ≠ n2 are not equal) or (Sum of squares of deviations is


given)
F-test is used to test whether two independent samples have been drawn from the normal population
with the same variance (variance is significant or not) or

Or

Whether two independent estimates of the population variance are homogeneous ( Equality of
variance) or not

F= / or [Greater variance /smaller variance] (where s1 is first sample Variance)

 x  x  y  y
2
2

s12  s2 2 
n1 n2

1.A sample size of 13 gave an estimated population variance of 3.0, while another sample of size 15 gave
an estimate of 2.5 could both samples be from populations with the same variance

H0 : The two samples have come from populations with the same

n1 = 13 n2 = 15, s12 =3.0 , s22 =2.5 (where variances of 2 samples)

= (15)(3)/12= 3.26 = (15)(2.5)/14=2.64

F= / = =3.26/2.64= 1.234

Table value at (df n1-1, n2-1 )(12, 14) is 2.53 > Calculate value 1.2

H0 Accepted.

Result : Both the samples come from the populations with the same variance

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 12


2.Two random samples gave the following results

Sample Size Mean Sum of squares of


deviations from the mean

1 10 15 90

2 12 14 108

Test whether the samples come from the same normal population

H0 : The two samples have been drawn from the same normal population

H0 = µ1 = µ2 and =

We have to use two tests i)Equality of Variance ii) equality of means

n1 = 10 n2 = 12,

Mean of sample1 =15 mean of sample2 = 14

 x  x  y  y
2 2
 90  108
,

 
2
xx 90
s12   9
n1 10

 y  y
2
108
s2 2   9
n2 12

90 12(9)
=  10 =  9.818
9 11

F= / = = 10/9.818 = 1.01

Table value at (df n1-1, n2-1 )(9, 11) is 3.07 > Calculate value 1.018

H0 Accepted.
Result : Both samples come from the same populations

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 13


ii)t-test
H0 = µ1 = µ2
n1s12  n2 s22
 9.922
Where (Big) S2 = n1  n2  2
Big S = 3.15
x1  x 2
Test of significance of difference of mean t  1 1
S2(  )
n1 n 2
= 0.74

Tabel value df(n1+n2-2 at 5% level = 2.086 > 0.74

H0 Accepted

Result : the given samples drawn from the same normal population

3.The nicotine contents in two random samples of tobacco are given below

Sample1 21 24 25 26 27
Sample2 22 27 28 30 31 36
Can you say that the two samples come from the same population? [ xbar = 24.6, y bar =29]

Total
Sample1 21 24 25 26 27 123

 
2
xx 21.2

Sample2 22 27 28 30 31 36 174

 
2
y y 108.96

Mean x = 123/5 = 24.6 Mean y = 172/6 = 29

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 14


 
2
xx
s12   4.24  y  y
2
108
n1 s2 2
   18.16
n2 6
= n1 S12/ n1-1 = 5.3, = n1 S22/
n2-1= 21.6, F = / =21.6/5.3 = 4.07
F0.05 (5,4) = 6.26 , H0 is accepted, result : variances are equal

n1 s12  n2 s 22 x1  x 2
t 
2
S = n1  n2  2 ,
t-test 1 1 = -1.92 < 2.26 ( table value)
S2(  )
n1 n 2
H0 Accepted, Could have been drawn from the same normal population

4..Two Independent samples of hieght and seven items respectively had the following values of the
variable
Sample1 9 11 13 11 15 9 12 14
Sample 2 10 12 10 14 9 8 10

Do the two estimates of populations variance differ significantly at 5% level of significance?

H0 : =

H0 : ≠

We have to use F tests i)Equality of Variance

N1= 8 n2 =7

Mean of sample1 =94/8 mean of sample2 = 73/7

= 1138

= 4.79

=3.39

F= / = =4.79/3.96 = 1.21

Table value at (df n1-1, n2-1 )(7, 6) is 4.21 > Calculate value 1.21
H0 Accepted.
Result : and does not differ significantly

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 15


Chi Square test
A powerful test for testing the significance of the discrepancy between theoretical and experimental
was given by karl pearsons. It enables us to find if the deviation of the experiment from theory is just by
chance or it is really due to inadequacy of th ory to fit the observed data.

Applications
To test the hypothetical value of the population variance is (sigma square)
To test the homogeneity of independent estimates of the population correlation coefficient
(O  E ) 2
i)Goodness of fit    E
2
, df = n-1

ii) Independent attributes df = (r-1)(c-1)

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 16


Chi Square test

1.The company keeps records of accidents during a recent safety review. A random
sample of 60 accidents were selected and classified by day of the week when they
occurred. Test whether the accidents are uniformly distributed over the week

Day Mon Tue Wed Thurs fri

No.of accidents 8 12 9 14 17

H0 : Accidents occur uniformly distributed over the week

Avg = 8+12+9+14+17 = 60/5 = 12

O 8 12 9 14 17 Total

E 60

(O-E)2 54

|O-E|2/E 1.33 0 0.75 4.5

(O  E ) 2
 2
=4.5
E

Table value n-1 =4 at 5% level is 9.49


calculated value
H0
Result

2.The following observations show a particular data in a telephone directory observed from the data

Number 0 1 2 3 4 5 6 7 8 9

Frequency 115 118 120 140 135 137 139 142 144 150

Test whether all digits are equally distributed and verify

Avg =134

O 115 118 120 140 135 137 139 142 144 150 Total

(O-E)2

|O-E|2/E 9.73

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 17


(O  E ) 2
 2  =9.73
E

Table value n-1 =9 at 5% level is 16.92

Table value df(n-1) at 5 % level calculated value

H0 Accepted

Result : All the digits are uniformly distributed

3.A sample analysis of exam results of 500 students was made. It was found that 200 students have
failed, 170 students secured a 3rd calss, 90 have secured a 2nd class and the rest a first class. So these
figures support the general belief that the above categories are into the ratio 4:3:2:1 respectively. Is
the results support the ration.

Exp freq ; 200,150,100,50

O Total

E 500

(O-E)2 600

|O-E|2/E 5.66

(O  E ) 2
 2  = 5.66
E

Table value n-1 =3 at 5% level is 7.82

Table value df(n-3) at 5 % level calculated value

H0 Accepted

Result : The results support the ratio

4.The following table gives the number of aircraft accidents that occurred during the various days of the
week . Test whether the accidents are uniformly distributed over the week [chisquare = 2.143]tv=11.07

Days Mon Tue Wed Thu Fri Sat

No.ofAccidents 14 18 12 11 15 14

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 18


5.The following figures show the distribution of digits in numbers chosen at random from a telephone
directory.

Digits 0 1 2 3 4 5 6 7 8 9

Frequency 1026 1107 997 966 1075 933 1107 972 964 853

[ Ans.Chisquare value = 54.102, tv= 16.92]

6.Fit a binomial distribution for the following data and also test the goodness of fit

X 0 1 2 3 4 5 6 Total

F 5 18 28 12 7 6 4 80

ANSWER

Fx 0 18 56 36 28 30 24 192

Mean(np) =∑fx/∑f 192/80= 2.4

The expected frequencies are 80 (0.6 +0.4) 6 , n=6, p = 0.4, q =0.6 , np = 2.4

P(X=x)= ncx px qn-x Where


x = 0,1,2,…6

X=0 =0.0467 *80=3.73


X=1 =0.178*80 = 14.93
X=2 =0.3110
X=3 =0.2765
X=4 =0.1382
X=5 =0.0369
X=6 =0.0041
E 3.73 14.93 24.88 22.12 11.06 2.95 0.33 Total

O 5 18 28 12 7 6 4

(O-E)2

|O-E|2/E 51.538

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 19


(O  E ) 2
Goodness of fit    E
2
=51.38

Table value n-1 =6 at 5% level is

Table value calculated value


12.59 51.38
H0 Rejected
Result
7.Fit a poisson distribution for the following data and also test the goodness of fit.

X 0 1 2 3 4 5 Total
F 142 156 69 27 5 1 400

ANSWER

Fx 0 156 138 81 20 5 1 400

Mean(npq) =∑fx/∑f =400/400 = 1

The expected frequencies are , n=6, MEAN LAMDA =1

P(X=x) =N Where x = 0,1,2,3….

X=0 =147.15
X=1 =147.15
X=2 =73.58
X=3 =24.53
X=4 =6.13
X=5 =1.23
E 142 147 74 25 6 1 Total

O 142 156 69 27 5 1

(O-E)2

|O-E|2/E 1.39

(O  E ) 2
Goodness of fit    E
2
= 1.39

Table value df(n-1) at 5 % level (11.09) calculated value(1.39)

H0
Result

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 20


Independent Attributes
2
Chi Square test

To test the significance of discrepancy( differences) between experimental (practical )values and
theoretical values.

Degrees of freedom

GOODNESS OF FIT

(O  E ) 2
Test of significance =   
2

1.The table given below shows the data obtained during an epidemic of cholera. Test the
effectiveness of inoculative the preventing the attack of cholera.

Attacked Not Attacked

Inoculated 31 469

Note inoculated 185 1315

Exp freq ; 54,446,162,1338

H0:

E 54 446 162 1338

(O-E)2

|O-E|2/E 9.796 1.186 3.263 0.395

(O  E ) 2
 2  = 14.64
E

Table value df(r-1)(C-1) = 1 at 5 % level calculated value


3.84 < 14.64

H0 Rejected

Result : It is not effective in preventing the attack cholera

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 21


2.In a certain sample of 2000 families, 1400 people are consumers of tea. Out of 1800 Hindu
families, 1236 Hindu families consume tea. Use chi square test and state whether is there any
significant difference in consumption of tea in Hindu and Non hindu families.
Exp freq ; 1260,140,540,60
H0:
O 1236 164 564 36

(O-E)2

|O-E|2/E 0.457 4.114 1.066 9.6

(O  E ) 2
 2  = 15.237
E

Table value df(r-1)(c-1) at 5 % level calculated value


3.84 < 14.64
H0 is rejected, Result . No, the consumption is not equal
3.Examine the nature of area is related to voting preference in the election for which the data are
tabulated below
Group A B Total

Rural 620 480 1100

Urban 380 520 900

Total 1000 1000 2000

(O  E ) 2
 2  = 39.59
E

Table value df(r-1)(c-1) at 5 % level calculated value


3.89 < 39.59

4.Given the following contingency table for hair colour and eye colour. Find the value of chi square. Is
there good association between the two

Hair colour

Fair Brown Black Total

BLUE 15 5 20 40

EYE GREY 20 10 20 50
COLOUR
BROWN 25 15 20 60

Total 60 30 60 150

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 22


Expected table

16 8 16

20 10 20

24 12 24

[Ans: chisquare value = 3.647, tv =9.49]

SAMPLING ASSIGNMENT AT THE END OF THE UNIT-3


1.A random sample of 500 tags was taken from a large consignment and 65 were found to be defective.
Find the fiducial limits of defective tags in the consignment.

2.In a sample of 1000 people in Maharashtra 540 are rice eaters and the rest are wheat eaters can we
assume that both rice and wheat are equally popular in this state at 1 % level of significance.

3.In a city a sample of 1000 people were taken and out of 540 are vegetarians and the rest
Non vegetarians. Can we say that both habits of eating (veg and Non veg) are equally popular in the
city i) 1% LOS ii) 5% LOS
4.Twenty people were attacked by a disease and only 18 survived. Will you reject the hypothesis that
the survival rate if attacked by this disease is 85 in favour of the hypothesis that is more at 5% level

5.A machine is producing bolts of which a certain fraction is defective. A random sample of 400 is
taken from a large batch and is found to certain 30 defective bolts. Does the indicate that the
proportion of defectives is larger than that claimed by the manufactured at 5% LOS

6. A machine puts out 16 imperfect articles in a sample of 500. After the machine is overhauled, it
puts out 3 imperfect articles in a batch of 100. Has the machine improved?

7.A cigarette manufacturing firm claims that its Brand A line of cigarettes outsells its Brand B by 8%.
If its found that 42 out of a sample of 200 smokers prefer Brand A and 18 out of another sample of
100 smokers prefer Brand B. Test whether 8% difference is a valid claim.

8.In a random sample of 400 students of the University teaching departments, it was found that 300
students failed in the examination. In another random sample of 500 students of the affiliated
colleges, the number of failures in the same examination was found to be 300. Find out the
proportion of failures in the university teaching departments and affiliated colleges taken together.

9.A survey is proposed to be conducted to know the annual earnings of the odd statistics graduates of
Delhi university . How large should the sample be taken inorder to estimate the mean annual
earnings within an individuals Rs.1000/- at 95% confidence level? The SD of the annual earnings of the
population is known to Rs.3000

Dr.N.BALAJI, Asst.Professor (SG), Department of Mathemtics, SRMIST 23


UNIT-4
ANALYSIS OF VARIANCE
&
NON PARAMETRIC TEST

ANALYSIS OF VARIANCE (ANOVA)


ANOVA is a statistical technique specially designed to test whether the means of more than 2
proportions are equal. It consists of classifying the statistical data & testing whether the

means of a specified classification differ significantly. The ANOVA is classified into two ways

i)one way classification ii) two way classification

One way Classification : In one way classification the data are classified
according to only one criteria. (based on only one factor)
CF = GT2 / N
Q = ΣΣ Xij2 – CF
Q1= Σ [CT2/ R] – CF
Q2=Q-Q1

Source of Variance Sum of Degrees of Mean Squares F Ratio


Squares
Freedom

Between Q1 C-1 MSC=Q1/C-1 F=MSC/MSE


Samples(Columns)
SSC (Greater/smaller
)
Within Samples Q2 N-C MSE=Q2/N-C

(Rows) SSE

c-no.of columns
n-given no.of observations
MSC – Mean squares columns
MSE - Mean squares error
CF – correction factor
GT – Grand total
CT - column total
SSC-Sum of squares columns
SSE-Sum of squares Error

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 89


1)Setup ANOVa for the following per hectare yield for three varieties of Wheat, Each grown in five plots.
Test whether is significant difference among the average yields in the 3 varieties of wheat.Test the
hypothesis that the population are equal at 5% level of significance

PLOT

Variety of 1 2 3 4 5 Toatl
Wheat
A1 6 8 5 12 9 40

A2 5 3 8 7 7 30

A3 10 7 11 10 12 50

H0 : There is no significance difference between the samples

X1 X12 X2 X22 X3 X32

6 36 5 25 10 100

8 64 3 9 7 49

5 25 8 64 11 121

12 144 7 49 10 100

9 81 7 49 12 144

40 350 30 196 50 514

CF = GT2 / N =(120)2 /15 = 960

Q = ΣΣ Xij2 – CF = (350+196+514)-960 =100

Q1= Σ [CT2/ R] – CF =[402/5+302/5+502/5]-960= ( 5000/5) -960 = 40

Q2=Q-Q1 =100-40 = 60

Source of Variance Sum of Degrees of Mean Squares F Ratio


Squares Freedom
Between Q1=40 c-1=3-1=2 MSC=Q1/c-1 F=MSC/MSE
Samples(Columns) =40/2= 20
SSC (Greater/smaller
) =20/5 =4
Within Samples Q2=60 n-c =15-3=12 MSE=Q2/n-c

(Rows) SSE 60/12=5

Calculated value = 4 > 3.88 TableF value( 2,12) at 5% level is 3.88

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 90


H0 rejected.

Result : There is a significance difference between the samples (Columns)


2)Three processes A,B, and C are tested to see whether their outputs are equivalent. The following
observation of output are made.

A 10 12 13 11 10 14 15 13
B 9 11 10 12 13
C 11 10 15 14 12 13
H0 : There is no significance difference between samples
H1:
CF = GT2 / N =(228)2 /19 =
Q = ΣΣ Xij2 – CF = (1224+615+955)-2736 =
Q1= Σ [CT2/ R] – CF =[982/8+552/5+752/6]-2736=
Q2=Q-Q1 =51

X1 X12 X2 X22 X3 X32


10 9 11
12 11 10
13 10 15
11 12 14
10 13 12
14 13
15
13
98 1224 55 615 75 955

Source of Variance Sum of Degrees of Mean Squares F Ratio


Squares freedom
Between Q1= 7 c-1=2 MSC=Q1/c-1 = F=MSC/MSE
Samples(Columns)
SSC 7/2=3.5 (Greater/smaller
) =3.5/3.187
Within Samples Q2= 51 n-c =16 MSE=Q2/n-c
=1.098
(Rows) SSE 51/16
=3.187

Table value(2,16)@5% Calculated value


3.63 > 1.098

H0 Accepted, RESULT

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 91


3)For salesmen who served four different areas sold the units as follows. Is there any
significant difference in their performance.

A 25 19 21 15
B 18 35 28 23
C 21 30 32 25
D 29 28 23 20
H0: There is no significance difference between 4 salesman in their performance of sales

H1:
X1 X12 X2 X22 X3 X32 X4 X42
25
19
21
15
80 1652 104 2862 108 2990 100 2554
CF = GT2 / N =922 /16 = 9604
Q = ΣΣ Xij2 – CF = (1652+2862+2990+2554)-9604 = 454
Q1= Σ [CT2/ R] – CF = 116
Q2=Q-Q1 =338
MSC=116/3 =F1=38.67
MSE=338/12=F2=28.16

Source of Variance Sum of Degrees of Mean Squares F Ratio


Squares
freedom

Between Q1=116 C-1= 3 MSC=Q1/c-1 = F=MSC/MSE


Samples(Columns)
SSC (Greater/smaller
) = 38.66/28.16
Within Samples Q2=338 N-C = 12 MSE=Q2/n-c
=1.372
(Rows) SSE

Table value (3,12) is 3.49 at 5% level of significance

Table value Calculated value


3.49 > 1.37
H0 Accepted

RESULT

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 92


4)The following table shows the lives in hours of four brands of electric lamps.

Brands A 1610 1610 1650 1680 1700 1720 1800

Brands B 1580 1640 1640 1700 1750

Brands C 1460 1550 1600 1620 1640 1660 1740 1820

Brands D 1510 1520 1530 1570 1600 1680

Perform an analysis of variance test homogeneity of the mean lives of four brands of lamps

Consider A (Assume any value b/w 1460 to 1820) - x value

X1 X2 X3 X4
1460
1550
1600
1620
1640
1660
1740
1820
11770 19817500 8310 1328100 13090 21503700 9410 14778700
2
CF = GT / N =69732938.26
Q = ΣΣ Xij2 – CF =
Q1= Σ [CT2/ R] – CF =
Q2=Q-Q1 =

Source of Variance Sum of Degrees of Mean Squares F Ratio


Squares Freedom
Between Q1= C-1=3-1=3 MSC=Q1/C-1 = F=MSC/MSE
Samples(Columns)
SSC (Greater/smaller
) = 2.21
Within Samples Q2= N-C =26-=23 MSE=Q2/N-C

(Rows) SSE

F =2.21

Table Value (3,23) is 3.05 > Calculated value 2.21

H0 Accepted

Result

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 93


Two way Classification
CF = GT2 / N

Q = ΣΣ Xij2 – CF

Q1= Σ [CT2/ R] – CF

Q2= Σ [RT2/ C] – CF

Q3=Q-Q1-Q2

Source of Sum of Degrees of Mean Squares F Ratio


Variance Squares freedom
Between Q1 C-1 MSC=Q1/c-1 F=MSC/MSE
Samples SSC (Greater/smaller)
(columns)
Between Q2 r-1 MSR=Q2/r-1
Samples SSR F=MSR/MSE
Rows
Error Q3 (C-1)(R-1) MSE=Q3/(c-1)(r-1)
SSE
1)A tea company appoints four Salesmen A,B,C and D and observes their sales in three seasons-summer,
winter and monsoon. The out sales in 1000 of units given below

Seasons A B C D
Summer 38 40 41 39
Winter 45 42 49 36
Monsoon 40 38 42 42
H0 : There is no significance difference between 4 salesman

H1 : There no significance difference between seasons respect to sales

X1 X2 X3 X4 RT RT2/4 X12 X22 X32 X42

38 40 41 39 158 24964/4 1444 1600 1681 1521


=6241
45 42 49 36 172 7396 2025 1764 2401 1296

40 38 42 42 162 6561 1600 1444 1764 1764

CT=123 120 132 117 5069 4808 5846 4581

15129/3 14400/3 17424/3 13689/3

5043 4800 5808 4563

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 94


CF = GT2 / N = (492)2 /12 = 20172
Q = ΣΣ Xij2 – CF =(5069+4808+5846+4581-20172) =132
Q1= Σ [CT2/ R] – CF = [(5043+4800+5808+4563)-20172]=(20214-20172)=42
Q2= Σ [RT2/ C] – CF =[(6241+7396+6561)-20172=26
Q3=Q-Q1-Q2 = 64
MSC =42/3 = 14
MSR =26/2 = 13
MSE(Residual) = 64/6 = 10.66
Source of Sum of Degrees of Mean Squares F Ratio
Variance Squares freedom
Between Q1 C-1 =3 MSC=Q1/C-1 F=MSC/MSE=
Samples SSC 14/10.66 = 1.313
(columns) (Greater/smaller)
Between Q2 r-1 = 2 MSR=Q2/R-1
Samples SSR F=MSR/MSE
Rows 13/10.66=1.219
Error Q3 (C-1)(r-1) MSE=Q3/(C-1)(R-1)
SSE =6
(Calculated F1=1.313, F2=F=1.219 )

F Table values at( 3,6) is 4.76 > 1.313 , H0 Accepted

( 2,6 ) is 5.14 > 1.219 H1 Accepted

Result1 : All the sales performance is same ( No significant difference b/w sales)

Result2 : All the seasons are same in sales( No significant difference b/w seasons)

2)An Experiment was designed to study the performance of 4 different detergents for cleaning fue
injectors. The following cleanness readings were obtained with specially designed equipment for 12
tanks of gas distributed over 3 different model of engines.

Engine1 Engine2 Engine3

DetergentA 45 43 51

DetergentB 47 46 52

DetergentC 48 50 55

DetergentD 42 37 49

Looking on the detergents of treatments and the Engines at blocks, Obtain the appropriate anova table
and test at 1% level of significance

H0:

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 95


H1:

X1 X2 X3 RT RT2/3 X12 X22 X32


45 43 51 139 19321/3
6440.33
47 46 52 145 7008.3
48 50 55 153 7803
42 37 49 128 5461.33

CT=182 176 207 8302 7834 10731

8281 7744 10712.25

CF = GT2 / N =5652/12 =26602.08


Q = ΣΣ Xij2 – CF =264.92
Q1= Σ [CT2/ R] – CF =135.17
Q2= Σ [RT2/ C] – CF =110.88
Q3=Q-Q1-Q2 =18.87

MSE(Residual) = 18.84/6 = 3.14


Source of Sum of Degrees of Mean Squares F Ratio
Variance Squares freedom
Between Q1 =135.17 C-1 = 2 MSC=Q1/c-1 F=MSC/MSE
Samples SSC 67.59 (Greater/smaller)
(columns)
Between Q2 =110.91 R-1 = 3 MSR=Q2/r-1
Samples SSR 30.96
Rows F=MSR/MSE
Error Q3 =18.84 (C-1)(r-1)=6 MSE=Q3/(c-1)(r-1)
SSE 3.145

Calculated value F1=21.53, F2=11.77

Table Value@(2,6)1%los( 10.92) Calculated value

Table Value@(3,6)1%los (9.78) Calculated value

RESULT1

RESULT2

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 96


3)Perform the two way classification for the following

Plots of Treatment Seasons


land
A B C D total

Land 1 36 36 21 35 128

Land2 28 29 31 32 120

Land3 26 28 29 29 112

90 93 81 96

H0 :

H1 :

CF = GT2 / N = (360 )2 /12

Q = ΣΣ Xij2 – CF =210

Q1= Σ [CT2/ R] – CF =42

Q2= Σ [RT2/ C] – CF =32

Q3=Q-Q1-Q2 = 210-42-32 =136

X1 X2 X3 X4 RT RT2/4 X12 X22 X32 X42

36 128

28 120

26 112

CT=90 93 81 96 2756 2921 2243 3090

CT2/R=2700 2883 2187 3072

MSC =22.67/14 = 1.619

MSR =22.67/16 = 1.417

MSE(Residual) = 136/6 = 22.67

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 97


Source of Sum of Degrees of Mean Squares F Ratio
Variance Squares
freedom

Between Q1 = C-1 MSC=Q1/c-1 F=MSC/MSE


Samples
SSC (Greater/smaller)
(columns)
22.67/14=1.619
Between Q2 = r-1 MSR=Q2/r-1
Samples F=MSR/MSE
SSR
Rows 22.67/14=1.417

Error Q3 = (C-1)(r-1) MSE=Q3/(c-1)(r-1)

SSE 22.67

[Calculated value F1=1.619 , F2=1.417]

Table Value(6,3)@5%los is 8.94 Calculated value

Table Value(6,2) 19.32 Calculated value

ESULT1

RESULT2

NON PARAMETRIC TEST


t, F and chisquare are based assumption that parent population ( from which the sample is drawn) has a
specific distribution like normal. These distributions are usually defined through some parameters.

Non parametric test donot require such assumption. Hence, non parametric test are known distribution
free test. Non parametric test statistics utilise some simple aspects of sample data such as the sighns of
measurement, order relationships or category frequencies. There for stretching and compressing the
scale does not alter them.

Two sample Rank Testing THE MANN – Whitney Test (-M.W.U-test)


These test will enable us to determine whether the 2 populations are identical under the null
hypotheseis H0 : u1 = u2 (2 populations are identical)

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 98


And the the test to be used is

U = n1n2 + [n1(n1+1)/2] – R1 ( for sample 1) or

U = n1,n2 + [n2(n2+1)/2] – R2 (for sample 2)

The mean and variance of the sampling distribution of U are mean = n1n2/2

n1n2 (n1  n2  1)
Variance =
12

U  E[U ]
The standard normal variate of U is | z |  N (0,1)
v(U )

I. Note : 1.Combine all the given samples (from smallest to largest) and the assign ranks to all
these values.

II. Assign the average of the ranks, if the sample values are same

III. Find the sum of the ranks for each of the sample. Let us denote these sums by R1 and R2

IV. Also n1 and n2 are their respective sample sizes.

V. For our convenience choose n1≤n2

1)The nicotine contents of 2 brands of cigarettes (in mg) was found to be as follows :

Brand A 2.1 4 6.3 5.4 4.8 3.7 6.1 3.3


Brand B 4.1 0.6 3.1 2.5 4 6.2 1.6 2.2 1.9 5.4
Test whether is there any significant difference between 2 brands

H0 = µ1 = µ2, (The avg nicotin contents of 2 brands are equal)

H1 = µ1 ≠ µ2 (The avg nicotin contents of 2 brands are not equal)

The ranks for the combine sample are

All the numbers write in ascending order

1 2 3 4 5 6 7 8 9 10.5
0.6 1.6 1.9 2.1 2.2 2.5 3.1 3.3 3.7 4

10.5 12 13 14.5 14.5 16 17 18


4 4.1 4.8 5.4 5.4 6.1 6.2 6.3

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 99


N1 =8 , n2 = 10 R1 = sum of ranks of n1 = 93

n1 (n1  1)
U = n1n2   R1( sample1)
n

U = 80 +(72/2) – 93 = 23

The mean and variance of the sampling distribution of U are

Mean = n1n2/2 = 80/2 = 40

n1n2 (n1  n2  1)
Variance = = [80 (19)]/12 = 126.67
12

U  E[U ]
The standard normal variate of U is | z |  where meanof u and var ianceof u
V (U )

23  40
| z | where meanof u and var ianceof u
126.67

|z| = 1.51

Calculated value table value at 5% LOS


1.51 < 1.96
H0 accepted
Result : The Average of nicotine contents of 2 brands are equal

2. From the following data, test the hypothesis of the differences between the Mine I and Mine II .
Using the Man Whitney U test . Use α=0.05

Value of 31 25 38 33 42 40 44 26 43 35
mine1
Value of 44 30 34 47 35 32 35 47 48 34
Mine II 46
H0 = µ1 = µ2, (There is no significant difference b/w )

H1 = µ1 ≠ µ2 (The avg nicotine contents of 2 brands are not equal)

The ranks for the combine sample are

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 100


All the numbers write in ascending order

1 2 3 4 5 6 7.5 7.5 10 10
25 26 30 31 32 33 34 34 35 35

10 12 13 14 15 16.5 16.5 18.5 18.5 20


35 38 40 42 45 44 44 47 47 48

n1 =10 , n2 = 10 R1 = sum of ranks of n1 = 93.5

n1 (n1  1)
U = n1n2   R1( sample1)
n

U = 100 +(110/2) – 93.5 = 61.5

The mean and variance of the sampling distribution of U are

Mean = n1n2/2 = 100/2 = 50

n1n2 (n1  n2  1)
Variance = = [100 (21)]/12 = 175
12

U  E[U ]
The standard normal variate of U is | z |  where meanof u and var ianceof u
V (U )

61.5  50
| z | where meanof u and var ianceof u
175

|z| = 0.86

Calculated value table value at 5% LOS


1.51 < 1.96
H0 accepted,

Result :

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 101


UNIT -V

STATISTICAL QUALITY CONTROL


SQC is a simple statistical method for determining the extent to which the quality goals are being met.
Without necessarily checking every item produced & for indicating whether or not the variations which
occur are exceeding normal expectations . It enables us to decide whether to accept or reject the process

Uses of SQC

1.Improvement in Quality

2.Helps in identification & correction of many production troubles

3.Reduce waste of time and material to absolute minimum

4.Reduction in cost/unit leading to more profit

Control charts

1.Define the goal to be achieved

2. To determine whether the goal set is being achieved by finding out, the production is control or not

3.CC is a device which helps in attainment of the specified goals by pointing out whether the variations at

a particular point due to assignable causes.

Control limits

The control line represents the quality and standard to be achieved and it is plotted as a dark line.
UCL(Upper control lin) and LCL( lower control line) are usually plotted as dotted line.

Process Control

The main objective of any production process is to control and maintain the quality of the manufacturing
product so that it confirms to specific quality stands.
Process Control : the quality of goods while they are in the process of production. To achieve process
control, repeated random samples are taken from the population of items.

Product control
By product control we mean controlling the quality of the product by critical examination at strategic
(important, danger) points and this is achieved thr` sampling Inspection plans.

Control line
The central line represents the quality standard to be achieved and ti is plotted as a dark line. Upper
control Limit and Lower CL are usually plotted as as dotted lines.

Dr.N. BALAJI , Asst. Professor (SG) Department of Mathematics, SRMIST 1


Tolerance limits

Tolerance limits of a quality characteristic are defined as those values between which nearly all the
manufactured items will lie.

In control charts 2 control limits have been set at a distance of 3𝜎 on either side of the mean.
If the measurable quality characteristics X is assumed to be normally distributed with mean µ and SD 𝜎,

If the variable x is normally distributed, the probability that the random observation would be with in
µ±3𝜎 is 0.9973. It means that the probability of an observation falling outside of these limits is
0.0027(0.27%) These control limits are also known as tolerance limits.

Types of Control charts

1.Control chart for variables or measurements

2.Mean Chart and Range chart are in the category ( X and R chart)

3.Mean and SD chart

3.Control charts for Attributes :Control charts for attributes the sampled units are divided into 2
categories
Defective and Non defective

i) Control chart for proportion of defectives - P chart


Fraction defective ( proportion ) = Total defectives / Total No.of Units
ii)Control chart no.of defectives - nP chart
iii)Control chart no.of defectives/unit - C chart

( c chart is used when no.of defects/unit are counted instead of classifying the item as defective or non-
defective ) or(in the proportion denominator will not be given

Types of Process Data

Two types of process data:

1. Variable: continuous data. Things we can measure. Example includes length, weight, time, temperature, diameter, etc.
2. Attribute: discrete data. Things we count. Examples include number or percent defective items in a lot, number of defects
per item etc.
3. n quality control a variable is a characteristic that can be measured, an attribute is a
characteristic that can be counted.

4. All variable control charts must track only one quality characteristic of one product on the same
chart.

5. Attribute charts can only provide nonconformance information on characteristics outside of


specifications.

6. Variable charts can show patterns within the specification limits.

Dr.N. BALAJI , Asst. Professor (SG) Department of Mathematics, SRMIST 2


Variables chart:

 Installing a new process or product or changing an old process or product.


 The process is obviously in trouble; it cannot produce to the tolerances on a consistent basis.
 Destructive or expensive testing is being used.
 Sampling further along the manufacturing process can be reduced by a more positive control at an
earlier stage.
 Attributes control charts have shown a problem to exist but the solution is difficult or unknown.
 There are difficult problem processes with tight specifications, overlapping assembly tolerances,
expensive materials etc.
 When large subgroup sizes are desired (greater than 8) and a variables chart is indicated, use X-
bar,s. When a variables chart is indicated but the characteristic is not critical enough to warrant a
large subsample size, use X-bar,R.
 The critical characteristic is measurable.
 Customer or contract requirements.

Attributes Chart
 Operators have a high degree of control over assignable causes.
 Assembly operations are complex.
 Quality can only be measured in terms of good or bad.
 Historical information is needed for management review.
 Many characteristics must be measured at one time.
 Cost of measurement is high.
 Production runs are large.

Dr.N. BALAJI , Asst. Professor (SG) Department of Mathematics, SRMIST 3


CHART NAME

x Chart Control Limits R Chart Control Limits


x1  x 2  x 3 ........x n R1  R2  R3 .........Rn
CL = x = CL  R 
X and R N N
LCL = x - A 2 R LCL = D3 R
UCL = D 4 R
UCL = x + A 2 R

x Chart Control Limits S chart


x1  x 2  x 3 ........x n x1  x 2  x 3 ........x n
CL = x = CL = x =
X and S N N
n LCL = B3 S
LCL = x - A1 s
n 1 UCL = B4 S
n
UCL = x + A1 s
n 1

NP-CHART(no.of defectives) P-CHART(PROPORTION or fractional

 np defectives)
np and P np  whereN  no. samples,
N  np
np  whereN  no. samples,
n  sample s in eachbatch N
n  sample s in eachbatch
1
p np 1
No.of Defectives n p
n
np

CL  np
UCL  np  3 np (1  p )
p (1  p )
UCL  p  3

LCL  max 0, np  3 np(1  p)  
n
p (1  p ) 
LCL  max 0, p  3 
 n 

C Chart  Ci
CL  c 
N
No.of Units

No. Defects directly UCL= c  3 c


given no need of
LCL= max 0, c  3 c  
tables

Dr.N. BALAJI , Asst. Professor (SG) Department of Mathematics, SRMIST 4


Xbar and R chart
x Chart Control Limits R Chart Control Limits
x1  x 2  x 3 ........x n R1  R2  R3 .........Rn
CL = x = R
n n
LCL = x - A 2 R LCL = D3 R
UCL = x + A 2 R UCL = D 4 R

1. You are given the value of sample means mean and range for 10 samples of size 5 each. Draw X bar
and R chart and comment on the state of control of the process
Sample 1 2 3 4 5 6 7 8 9 10
Mean 43 49 37 44 45 37 51 46 43 47
Range 5 6 5 7 7 4 8 6 4 6

A2=0.577 , D3 =0, D4=2.115 , N =number of samples n = Each sample of size

x Chart Control Limits R Chart Control Limits


x1  x 2  x 3 ........x n R1  R2  R3 .........Rn
CL = x =  44.2 CL  R   5.5
N N
LCL = x - A 2 R  40.85 LCL = D3 R  0
UCL = x + A 2 R  47.54 UCL = D 4 R  12.27

2.The following data gives the data of an automobile path 5 samples of 4 items were taken on a random
samples basis. Draw the mean chart and R chart and whether the production process is in control
Sample 1 2 3 4 5
10 10 10 11 12
12 12 10 10 12
Production 10 13 9 9 12
12 13 11 14 12

Dr.N. BALAJI , Asst. Professor (SG) Department of Mathematics, SRMIST 5


1 2 3 4 5
Mean
Range

x Chart Control Limits


x1  x 2  x 3 ........x n R Chart Control Limits
CL = x = R1  R2  R3 .........Rn
n CL  R  
N
LCL = x - A 2 R LCL = D3 R 
UCL = x + A 2 R UCL = D 4 R 

Example 3 Given below are the values of sample mean X and sample range R for 10 samples, each of size 5.
Draw the appropriate mean and range charts and comment on the state of control of the process.

Sample 1 2 3 4 5 6 7 8 9 10
Mean 43 49 37 44 45 37 51 46 43 47
Range 5 6 5 7 7 4 8 6 4 6

Example 4 A machine fills boxes with dry cereal. 15 samples of 4 boxes are drawn randomly. The weights of
the sampled boxes are shown as follows. Draw the control charts for the sample mean and sample range
and determine whether the process is in a state of control.
Sample Number 1 2 3 4 5 6 7 8
10.0 10.3 11.5 11.0 11.3 10.7 11.3 12.3
Weights of boxes (X) 10.2 10.9 10.7 11.1 11.6 11.4 11.4 12.1
11.3 10.7 11.4 10.7 11.9 10.7 11.1 12.7
12.4 11.7 12.4 11.4 12.1 11.0 10.3 10.7

9 10 11 12 13 14 15
11.0 11.3 12.5 11.9 12.1 11.9 10.6
13.1 12.1 11.9 12.1 11.1 12.1 11.9
13.1 10.7 11.8 11.6 12.1 13.1 11.7
12.4 11.5 11.3 11.4 11.7 12.0 12.1

1 2 15
X
bar
R

Dr.N. BALAJI , Asst. Professor (SG) Department of Mathematics, SRMIST 6


x Chart Control Limits R Chart Control Limits
x1  x 2  x 3 ........x n R1  R2  R3 .........Rn
CL = x =  CL  R  
N N
LCL = x - A 2 R  LCL = D3 R 
UCL = x + A 2 R  UCL = D 4 R 

Example5(HW)The following data give the average life in hours and range in hours of 12 samples each of 5
lamps. Construct the control charts for X and R and comment on the state of control.
X : 120 127 152 157 160 134 137 123 140 144 120 127
R: 30 44 60 34 38 35 45 62 39 50 35 41

Formulas X bar and S chart


x Chart Control Limits
x1  x 2  x 3 ........x n S chart
CL = x =
N x1  x 2  x 3 ........x n
CL = x =
n N
LCL = x - A1 s
n 1 LCL = B3 S
n UCL = B4 S
UCL = x + A1 s
n 1
Example-6 The following data give the mean and S.D. values of 10 samples, each of size 5 drawn from a
production process taken at intervals of one hour.Construct the mean and S.D. charts and comment on the state
of control.
Sample No.: 1 2 3 4 5 6 7 8 9 10
X: 54 51 54 49 52 47 51 50 50 52
s: 3.3 2.4 3.8 3.3 3.4 4.6 1.9 2.5 2.5 2.9

Formulas X bar and S chart


x Chart Control Limits S chart
x1  x 2  x 3 ........x n x1  x 2  x 3 ........x n
CL = x = CL = x =
N N
n
LCL = x - A1 s LCL = B3 S
n 1
n UCL = B4 S
UCL = x + A1 s
n 1

Dr.N. BALAJI , Asst. Professor (SG) Department of Mathematics, SRMIST 7


Example7 : The values of sample mean X and sample standard deviation s for 15 samples, each of size 4,
drawn from a production process are given below. Draw the appropriate control charts for the process
average and process variability. Comment on the state of control.
Sample 1 2 3 4 5 6 7 8 9 10
No.
Mean 15.0 10.0 12.5 13.0 12.5 13.0 13.5 11.5 13.5 13.0
S.D. 3.1 2.4 3.6 2.3 5.2 5.4 6.2 4.3 3.4 4.1

11 12 13 14 15
14.5 9.5 12.0 10.5 11.5
3.9 5.1 4.7 3.3 3.3
Formulas X bar and S chart
x Chart Control Limits S chart
x1  x 2  x 3 ........x n x1  x 2  x 3 ........x n
CL = x =  12.36 CL = x =  S  4.02
N N
n
LCL = x - A1 s  12.36  (1.88)(4.02) 4 / 3  3.63
n 1 LCL = B3 S  0
n UCL = B4 S  (2.26)(4.02)  9.109
UCL = x + A1 s  21.08
n 1
Example8 The following data given the coded measurements of 10 samples each of size 5, drawn from a
production process at intervals of 1 hour. Calculate the sample means and S.D.’s and draw the control charts
for X and s.
Sample 1 2 3 4 5 6 7 8 9 10
Number
Coded meas- 9 10 10 8 7 12 9 15 10 16
urements (X) 15 11 13 13 9 15 9 15 13 14
14 13 8 11 10 7 9 10 14 12
9 6 12 10 4 16 13 13 7 14
13 10 7 13 5 10 5 17 11 14
Avg. 12 10 10 11 7 12 9 14 11 14
x- x

∑(x- x )2

∑(x- x )2/n
2
S=sqrt[∑(x- x ) /n 2.5 2.3 2.3 1.9 2.3 3.3 2.5 2.4 2.4 1.3

Formulas X bar and S chart


S chart
x Chart Control Limits
x1  x 2  x 3 ........x n
x  x 2  x 3 ........x n CL = x =  S  2.32
CL = x = 1  11 N
N
n
LCL = x - A1 s  12.36  (1.88)(4.02) 4 / 3  6.86 LCL = B3 S  0
n 1
n UCL = B4 S  (2.089)(2.32)  4.846
UCL = x + A1 s  15.139
n 1

Dr.N. BALAJI , Asst. Professor (SG) Department of Mathematics, SRMIST 8


Limits for Attributes Control Charts

Control chart for P-chart ( Proportion of defectives)


P  Chart
No. of defectives in a sample
n p
No.of items inspectd in the sample

p Tota l no. of defectives in asample


CL  p  
n Total number of item s inspected inall samples

p (1  p )
UCL p  p  3
n
p (1  p )
LCL p  p  3
n

np-chart
The use of attribute control charts arises when items are compared with some standard and then
are classified as to whether they meet that standard or not. The Np control chart is used to
determine if the rate of nonconforming product is stable, and will detect when a deviation from
stability has occurred. There are those who argue that there should only be an Upper Control
Limit (UCL), and NOT a Lower Control Limit (LCL) since rates of nonconforming product
outside the LCL is actually a good thing. However, if we treat the LCL violations as another
search for an assignable cause, we could learn where lower nonconformity rates lie and perhaps
eliminate them further.
 Collect the data recording the number inspected (N) and the number of defective products
(Np). Divide the data into subgroups. Usually, the data is grouped by date or by lot
numbers. The subgroup size (N) should be over 50, and it is strongly recommended you
stick with the constant sample size of 100 for subgroups
np  ch ar t
1
p  np
n
 np
CL 
N
LCL  max  0, np  3 np (1  p ) 

UCL  np  3 np (1  p )

Dr.N. BALAJI , Asst. Professor (SG) Department of Mathematics, SRMIST 9


9.In a factory producing a type of TV transistors, lots of 200 items are inspected at a time. Given the
number of defectives, in 10 lots. Shown in the table below, draw suitable control chart and comment
on the production process
No. of Defectives. 16 37 13 26 14 24 30 23 35 14
np  ch ar t
1
P  Chart p np
No. of defectives in a sample n
n p
No.of items inspectd in the sample  np
CL 
N
p
CL  p  
Tota l no. of defectives in asample
LCL  max  0, np  3 np (1  p ) 
n Total number of item s inspected inall samples

p (1  p )
UCL  np  3 np (1  p )
UCL p  p  3
n
p (1  p )
LCL p  p  3
n

Example0 : 15 samples of 200 items each were drawn from the output of a process. The number of defective
items in the samples are given below. Prepare a control chart for the fraction defective and comment on the
state of control.
Sample No. (i) : 1 2 3 4 5 6 7 8 9 10
No. of defective (np) : 12 15 10 8 19 15 17 11 13 20

: 11 12 13 14 15
: 10 8 9 5 8

np  ch ar t
1
P  Chart p np
No. of defectives in a sample n
n p
No.of items inspectd in the sample  np
CL 
N
p
CL  p  
Tota l no. of defectives in asample
LCL  max  0, np  3 np (1  p ) 
n Total number of item s inspected inall samples

p (1  p )
UCL  np  3 np (1  p )
UCL p  p  3
n
p (1  p )
LCL p  p  3
n

Example11 : 10 samples each of size 50 were inspected and the number of defectives in the inspection were: 2,
1, 1, 2, 3, 5, 5, 1, 2, 3. Draw theappropriate control chart for defectives.

Example12: Construct a control chart for defectives for the following data:
Sample No. : 1 2 3 4 5 6 7 8 9 10
No. inspected : 90 65 85 70 80 80 70 95 90 75
No. of defectives : 9 7 3 2 9 5 3 9 6 7

Dr.N. BALAJI , Asst. Professor (SG) Department of Mathematics, SRMIST 10


Example13.In a factory producing a type of TV transistors, lots of 200 items are inspected at a time.
Given the number of defectives, in 10 lots. Shown in the table below, draw suitable control chart and
comment on the production process
No. of Defectives. 16 37 13 26 14 24 30 23 35 14

Example14(HW) On inspection of 10 samples, each of size 400, the numbers of defective articles were:
19, 4, 9, 12, 9, 15, 26, 14, 15, 17.
Draw the np-chart and p-chart and comment on the state of control.

np  ch ar t
P  Chart 1
p np
n p
No. of defectives in a sample n
No.of items inspectd in the sample
 np
CL 
N
p
CL  p  
Tota l no. of defectives in asample
LCL  max  0, np  3 np (1  p ) 
n Total number of item s inspected inall samples

p (1  p )
UCL  np  3 np (1  p )
UCL p  p  3
n
p (1  p )
LCL p  p  3
n

Example15(HW) Draw the appropriate control chart for the following data and comment
on the state of control:
Day: 1 2 3 4 5 6 7 8 9 10
No. inspected: 150 184 181 196 180 174 210 210 195 210
No. of defectives: 25 10 3 14 6 15 43 28 39 25

Dr.N. BALAJI , Asst. Professor (SG) Department of Mathematics, SRMIST 11


N - Number of samples

 Ci
CL  c 
N

UCL= c  3 c
LCL= max 0, c  3 c 

16.Example : 15 tape-recorders were examined for quality control test. The number
of defects in each tape-recorder is recorded below. Draw the appropriate control
chart and comment on the state of control.
Unit no. (i) : 1 2 3 4 5 6 7 8 9 10 11
No. of defects (c) : 2 4 3 1 1 2 5 3 6 7 3

12 13 14 15
1 4 2 1

C  chart
 Ci
CL  c  
N
LCL  max  0, c  3 c  

UCL  c  3 c 

17.Example : A plant produces paper for newsprint and rolls of paper are inspected for defects.
The results of inspection of 20 rolls of papers are given below: Draw the c-chart and comment
on the state of control.

Roll No. (i): 1 2 3 4 5 6 7 8 9 10


No. of defects (c): 19 10 8 12 15 22 7 13 18 13

11 12 13 14 15 16 17 18 19 20
16 14 8 7 6 4 5 6 8 9

 Ci 220
CL  c  
N 20

LCL= c  3 c = 1.05

UCL= c  3 c = 20.95

Dr.N. BALAJI , Asst. Professor (SG) Department of Mathematics, SRMIST 12


15MA305 - STATISTICS FOR INFORMATION
TECHNOLOGY
(Multiple Choice Questions)
UNIT-I INTRODUCTION TO STATISTICS

1. What is the mean of the following numbers 1, 2, 2, 8, 9, 14?


A.6 B.13 C.5 D.2

2. The following are scores made on a math test 80, 90, 90, 85, 60, 70, 75, 85,

90, 60, 80. What is the median of these scores?

A.70 B.90 C.85 D.80


3. Which set of data has a mean of 15, a range of 22, a median of 14 and a
mode of 14?

A.14 22 15 15 9 B.14 22 14 15 4
C.3 14 19 25 14 D.25 15 14 3 7

4. 4. The harmonic mean, arithmetic mean and geometric mean are all
considered as

A. Mathematical averages B. Population averages


C. Sample averages D. extended measures

5. If the arithmetic mean is 25 and the harmonic mean is 15 then the


geometric mean is

A.11.36 B.12.36 C.10.36 D.19.36

6. In which of the following manner the geometric mean, harmonic mean and
arithmetic mean are related?
A. AM < GM < HM B. AM > GM < HM
C. AM > GM > HM D. AM < GM > HM

7. For the individual observations, the reciprocal of arithmetic mean of the


reciprocal of individual observations is called

A. geometric mean B. harmonic mean


C. deviation square mean D. paired mean

8. If the quartile range is 24 then the quartile deviation is given by


A.48 B.12 C.24 D.72

9. If the arithmetic mean is multiplied to coefficient of variation then the


resulting value is classified as
A. coeffecient of deviation B. coeffecient of mean
C. standard deviation D. variance

10. If the variance of a set of observations is 100, then the SD of the set is

A. 1/10 B. -10 C.±10 D. 10

11. Coefficient of Range is equal to

A. (L-S)/(L+S) B. (L+S)/(L-S)

C. (LS)/(L+S) D. (LS)/(L-S)

12. In a symmetric distribution

A. Median=Mode B. Mean=Mode.
C. Mean = Median = Mode D. Mean=Median +Mode

13. If the mode is not well defined then Pearson’s coefficient of skewness is
given by

A. 2(median-mean)/standard deviation
B. 3(median-mean)/standard deviation
C. 2(mean-median)/standard deviation
D. 3(mean-median)/standard deviation

14. The mean deviation about the mean is equal to

f x  x f x  x
A. B.
N 2N
2 2
f x  x
2
f x 2  x
C. D.
2N N

15. If the beta one is 9, beta two is 11 then Coefficient of Skewness is

A. 0.589 B. -2.625 C.0.489 D.0.889

16. The method of calculating skewness which is based on the positions of


quartiles and median in a distribution is called
A. Gary's coefficient of skewness B. Sharma's coefficient of
skewness

C. Bowley's coefficient of skewness D. Jack Karl's coefficient of


skewness

17. In kurtosis, the β2 is greater than three, then the frequency distribution is
preferred to as

A. mesokurtic distribution B. mega curve distribution

C. leptokurtic distribution D. platykurtic distribution

18. The three times of difference between mean and median is divided by
standard deviation to calculate coefficient of skewness by method of

A. Karl Pearson B. Professor Keller

C. Professor Bowley D. Professor Kelly

19. The variability which is defined as the difference between third and first
quartile is considered as

A. quartile range B. deciles range

C. percentile range D. inter quartile range

20. If the distribution is moderately asymmetrical, the mean, median and mode
obey the empirical relationship by Karl Pearson as

A. Mode = 3Median - 2Mean B. Median = 2Mode - 3Mean

C. Mean = Median = Mode D. Mean = 2Mode - 3Median

21. In a frequency curve of scores, the mode is found to be higher than the
mean, this shows that the distribution is

A. symmetric B. negatively skewed

C. positively skewed D. normal

22. Coefficient of variation is 60 and standard deviation is 20 what is


arithmetic mean

A. 40.33 B. 30.33
C. 33.33 D. 13.33

23. 10 is the mean of a set of 7 observations and 5 is the mean of a set of 3


observations. The mean of a combined set is given by

A. 15 B. 10

C. 8.5 D. 7.5

24. The accurate measure of Dispersion is

A. Range B. Standard deviation

C. Quartile Deviation D. Mean Deviation

25. The measure of central tendency which does not give more weightage to
smaller values is

A. Arithmetic mean B. Geometric mean

C. Harmonic mean D. Standard Deviation


UNIT-II CORRELATION AND REGRESSION

1. A correlation coefficient is computed to be -0.95 means that

(i) The relationship between two variables is weak

(ii) The relationship between two variables is strong and positive

(iii) The relationship between two variables is strong and negative

(iv) Correlation coefficient cannot have this value

2. In regression, the equation that describes how the response variable (y) is

related to the explanatory variable (x) is

(i) The correlation model (ii) The regression model

(iii) Used to compute the correlation (iv) Used to compute variation

3. If b1 and b2 are regression coefficients, then the correlation coefficient is


b1 b1  b 2
(i) , (ii) (iii) b1 b 2 (iv) b1b 2
b2 2

4. The two lines of regression are given as x+2y-5 = 0 , 2x+3y = 8. Then the mean values
of x and y are respectively given by

(i) (2, 1) (ii) (1, 2) (iii) (2, 5) (iv) (2, 3)

5. The tangent of the angle between two regression lines is given as 0.6 and SD of y
is known to be twice that of x .Then r(x,y) is

(i) - ½ (ii) ½ (iii) 0.7 (iv) 0.3

6.When the correlation coefficient r = ±1 then regression lines

(i) are perpendicular to each other (ii) coincide

(iii) are parallel to each other (iv) Do not exist.


7. If one regression coefficient is greater than unity, then the other must be

(i) greater than the first one (ii) equal to unity

(iii) less than unity (iv) equal to zero

8. For calculating rank coorelation, the correction factor for repeated rank is

m(m 2  1) m 2 (m  1) m(m  2) m
(i) , (ii) , (iii) , (iv)
12 12 6 2

9. If the correlation coefficient between X and Y is zero , then

(i) X and Y are indepent variables (ii) X and Y are dependent variables

(iii) X and Y are negatively correlated (iv) X and Y are positively correlated

10.The correlation coefficient (X-independent, Y-dependent) will have the sign when

(i) X is increasing, Y is decreasing (ii) both X and Y are increasing

(iv) X is decreasing , Y is increasing (iv) there is no change in X and Y.

11.Correlation coefficient

(i) can take any value in between -1 and 1 (ii) is always less than 1

(iii) is always greater than 1 (iv) cannot be zero.

12.The correlation coefficient between x and y is 0.6.Their covariance is 4.8. The


variance of x is 9. Then S.D of y is

4.8 0.6 3 4.8


(i) , (ii) , (iii) , (iv)
3x0.6 4.8x3 4.8x0.6 9x0.6

13.Correlation coefficient is independent of

(i) change of scale only (ii) change of origin only

(iii) change of origin and scale

(iv) neither change of origin nor change of scale


14. If two regression lines are x+3y-5= 0, 4x+3y-8= 0 then byx and bxy are
respectively

(i) (-1/3 , -3/4) (ii) (-3/4 , -1/3) (iii) (1, 1/3) (iv) (-3/4,1)

15. If θ is the angle between two regression lines then

1−𝑟 2 𝜎𝑥 σ 1−𝑟 2 𝜎𝑥 σy 1−𝑟 2


(𝑖)𝑡𝑎𝑛𝜃 = ( ) (𝜎2 +𝜎y2 ) , (𝑖𝑖) 𝑡𝑎𝑛2𝜃 = ( ) ( ) , (𝑖𝑖𝑖)𝑡𝑎𝑛𝜃 = ( )
𝑟 𝑥 𝑦 𝑟 𝜎𝑥2 +𝜎𝑦2 𝑟

1−𝑟 2 𝜎𝑥+ σ
(𝑖𝑣)𝑡𝑎𝑛𝜃 = ( ) (𝜎2 +𝜎y2 )
𝑟 𝑥 𝑦

16. If U= (X-a)/h , V = (Y-b)/k then byx =

(i) (h/k) bvu (ii) (h2/k)bvu (iii) (k/h)bvu (iv) bvu

17. If the lines of regressions are y = x/4 and x = ( y/9) +1 then r(x,y) is

(i) 1/3 (ii) -1/3 (iii) 1/6 (iv) -1/6

18. when the relationship between more than two variables are studied , the correlation is
known as

(i) Simple (ii) Partial (iii) Multiple (iv) Linear

19. The study of correlation between two variables excluding some other variables is
called

(i) total correlation (ii) non-linear correlation

(iii) partial correlation (iv) multiple correlation

20.which of the following is the highest range of r?

(i) 0 and 1 (ii) -1 and 0 (iii) -1 and 1 (iv) –0.5 and 0.5
2
21.The value of r for a particular situation is 0.81.what is co-efficient of
correlation?

(i) 0.81 (ii) 0.9 (iii) 0.09 (iv) 0.18

22. When the no.of items are greater than 30 and the ranks are given, the co-
efficient of correlation of the following method is used

(i) Spearman’s method (ii) Karl pearson’s method

(iii) Concurrent deviation method (iv) Method of least squares

23.The regression analysis measures the ----------------- between X and Y.

(i) dependence (ii) degree of relationship (iii) direction (iv) angle

24.The regression lines cut each other at the point of

(i) average of X and Y (ii) average of X only

(iii) average of Y only (iv) zero

25. when the two regression lines coincide then r is:

(i) 0 (ii) -1 (iii) 1 (iv) 0.5


Unit4 –SAMPLING
MULTIPLE CHOICE QUESTIONS

1. If a researcher takes a large enough sample, he/she will almost always obtain:
a. virtually significant results
b. practically significant results
c. consequentially significant results
d. statistically significant results
ANSWER: d

2. The null and alternative hypotheses divide all possibilities into:


a. two sets that overlap
b. two non-overlapping sets
c. two sets that may or may not overlap
d. as many sets as necessary to cover all possibilities
ANSWER: b

3. Which of the following is true of the null and alternative hypotheses?

a. Exactly one hypothesis must be true


b. both hypotheses must be true
c. It is possible for both hypotheses to be true
d. It is possible for neither hypothesis to be true
ANSWER: a

4. The chi-square goodness-of-fit test can be used to test for:


a. significance of sample statistics
b. difference between population means
c. normality
d. probability
ANSWER: c

5. A type II error occurs when:


a. the null hypothesis is incorrectly accepted when it is false
b. the null hypothesis is incorrectly rejected when it is true
c. the sample mean differs from the population mean
d. the test is biased
ANSWER: a

6. The form of the alternative hypothesis can be:


a. one-tailed
b. two-tailed
c. neither one nor two-tailed
d. one or two-tailed
ANSWER: d

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 130


7. A two-tailed test is one where:
a. results in only one direction can lead to rejection of the null hypothesis
b. negative sample means lead to rejection of the null hypothesis
c. results in either of two directions can lead to rejection of the null hypothesis
d. no results lead to the rejection of the null hypothesis
ANSWER: c

8. The value set for  is known as:


a. the rejection level
b. the acceptance level
c. the significance level
d. the error in the hypothesis test
ANSWER: c

9. Which of the following values is not typically used for  ?


a. 0.01
b. 0.05
c. 0.10
d. 0.25
ANSWER: d

10. The hypothesis that an analyst is trying to prove is called the:


a. elective hypothesis
b. alternative hypothesis
c. optional hypothesis
d. null hypothesis
ANSWER: b

11. The chi-square test is not very effective if the sample is:
a. small
b. large
c. irregular
d. heterogeneous
ANSWER: a

12. A type I error occurs when:


a. the null hypothesis is incorrectly accepted when it is false
b. the null hypothesis is incorrectly rejected when it is true
c. the sample mean differs from the population mean
d. the test is biased
ANSWER: b

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 131


13. What is the standard deviation of a sampling distribution called?
a. Sampling error
b. Sample error
c. Standard error
d. Simple error
ANSWER: c

14. A ______ is a subset of a _________.


a. Sample, population
b. Population, sample
c. Statistic, parameter
d. Parameter, statistic
ANSWER: a

15. A _______ is a numerical characteristic of a sample and a ______ is a numerical


characteristic of a population.
a. Sample, population
b. Population, sample
c. Statistic, parameter
d. Parameter, statistic
ANSWER: c

16. _________ is the values that mark the boundaries of the confidence interval.
a. Confidence intervals
b. Confidence limits
c. Levels of confidence
d. Margin of error
ANSWER: b

17. _____ results if you fail to reject the null hypothesis when the null hypothesis is
actually false.
a. Type I error
b. Type II error
c. Type III error
d. Type IV error
ANSWER: b

18. When the researcher rejects a true null hypothesis, a ____ error occurs.
a. Type I
b. Type A
c. Type II
d. Type B
ANSWER: a

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 132


19. __________ is the failure to reject a false null hypothesis.

a. Type I error
b. Type II error
c. Type A error
d. Type B error
ANSWER: a

20. Which of the following statements is/are true according to the logic of hypothesis
testing?
a. When the null hypothesis is true, it should be rejected
b. When the null hypothesis is true, it should not be rejected
c. When the null hypothesis is false, it should be rejected
d. When the null hypothesis is false, it should not be rejected
e. Both b and c are true
ANSWER: e

21. A failing student is passed by an examiner, it is an example of


(a) Type I error (b) Type II error (c) Unbiased decision (d) Difficult to tell
ANSWER: b

22. A passing student is failed by an examiner, it is an example of


(a) Type I error (b) Type II error (c) Best decision (d) All of the above
ANSWER: a

23. Area of the rejection region depends on


(a) Size of α (b) Size of β (c) Test-statistic (d) Number of values
ANSWER: a

24. Which hypothesis is always in an inequality form?


(a) Null hypothesis (b) Alternative hypothesis (c)Simplehypothesis
(b) (d) Composite hypothesis
ANSWER: b

25. The degree of freedom for t-test based on n observations is


(a) 2n -1 (b) n -2 (c) 2(n -1) (d) n -1
ANSWER: d

26. Student’s t-distribution has (n-1) d.f. when all the n observations in the sample are
(a) Dependent (b) Independent (c) Maximum (d) Minimum
ANSWER: b

27. The number of independent values in a set of values is called


(a) Test-statistic (b) Degree of freedom (c) Level of significance (d)Levelofconfidence
ANSWER: b
28. For sufficiently large value of n, the t- distribution tends to the standard ---

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 133


distribution.
a) Binomial b) Poisson c) normal d) exponential

29. The range of F- distribution is ---


a) 0 to ∞ b) -∞ to ∞ c) -1 to ∞ d) 1 to ∞

30. t- test and F- test are used only for --- samples.
a) Large b) 90 c) small d) 80

31. t- test is a --- distribution.


a) Bimodal b) normal c) unimodal d) exponential

32. The t - statistic is defined by ---


x x c) t  x   d) x
a) t b) t t
s s n s
n n
33. The F - statistic is defined by ---
3
a) F  s1 b) s12 c) F  s1 d) F  s1
2
F
s2 s22 s2 2
s2

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 134


UNIT-II &IV
ANOVA, CORRELATION AND REGRESSION

1. The range of simple correlation co-efficient is


(a) 0 to  (b)  to  (c) 0 to1 (d) 1to  1
2. The correlation between the two variables is unity, there is
(a) Perfect correlation (b) Perfect positive correlation
(c) Perfect negative correlation (d) No correlation

3. Regression co-efficient is independent of the change of:


(a) Scale (b) Origin
(c) Both origin and scale (d) neither origin and scale

4. If the two line of regression are x  2y  5  0 and 2x  3y  8  0 the means of x and y


are
(a) x   3, y  4 (b) x  2, y  4 (c) x  1, y  2 (d) none of the above

5. If b yx and b xy are two regression co-efficients, the have


(a) Same sign (b) Opposite sign
(c) Either same or opposite signs (d) nothing can be said

6. If b yx  1 , then b xy is
(a) Less than 1 (b) Greater than 1 (c) Equal to 1 (d) Equal to 0

7. The correlation co-efficient is used to determine:


(a) a specific value of y-variable given a specific value of x-variable
(b) a specific value of x-variable given a specific value of y-variable
(c) the strength of the relationship between the x and y variables
(d) none of these

8. In regression analysis, the variable that is being predicted is the


(a) response or dependent variable (b) Independent variable
(c) intervening variable (d) is usually x

9. For the following data, the Pearson correlation is:

x 1 3 4 5 7 8 10
y 2 6 8 10 14 16 20
(a) Perfect correlation (b) Perfect positive correlation
(c) Perfect negative correlation (d) cannot be determined

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 135


10. For the equation y = 3x-2, if the mean of y is 10, what is the mean of x
(a) 8 (b) 28 (c) 4 (d) 12

11. When the correlation co-efficient r   1 , then the two regression lines are
(a) are perpendicular to each other (b) coincide
(c) are parallel to each other (d) do not exist

12. The regression co-efficients are b 2 and b1 , then the correlation co-efficient r is
b1 b2
(a) (b) (c) b1.b 2 (d)  b1.b 2
b2 b1

13. In one-way classification the data are classified according to only --- criterion.
a) two b) one c) five d)six

14.In two-way classification the data are classified according to --- different factor.
a) two b) one c) five d) six

Answers:

1 d
2 b
3 b
4 c
5 a
6 b
7 c
8 a
9 c
10 c
11 c
12 d
13 b
14 a
15

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST 136


UNIT -V
STATISTICAL QUALITY CONTROL
MULTIPLE CHOICE QUESTIONS

1. In statistical quality control , by quality we mean an attributes of the product that determines
its --- for use.
a) Cost b) price c) manpower d) Fitness

2. Quality control is a powerful --- technique for effective diagnosis of lack of quality in any
of the materials
a) productivity b) quantitative c) non-productivity d) cost

3. By quality of materials , we mean a good quality will result in smooth processing there by
reducing the waste and increasing the ---
a) input b) output c) cost d) production cost

4. By quality of manpower , we mean the trained and qualified personal will give increased
efficiency due to the better quality production through the application skill and reduce the --
- and waste.
a) Production cost b) quantity c) material d) business

5. By quality of machines , we mean a better quality --- which will result in efficient work.
a) Cost b) equipment c) manpower d) production cost

6. Quality control based on process production are classified into --- factors.
a) one b) two c) three d) four

7. SQC is a productivity enhancing and regulatory technique with three factors - management,
methods and ---
a) mathematics b) chemistry c) physics d) biology

8. A production process is said to be in a state of statistical control, if it is governed by chance


causes alone , in the absence of --- causes of variation.
a) physical b) assignable c) chemical d) biological

9. The technique of control charts was pioneered by ---


a) Gosset b) Robert c) W.A.shewhart d) R.A.Fisher

10. The main objective in production process is to --- and maintain the quality of the
manufactured products.
a) control b) uncontrol c) assign d) produce

11. Control charts provide criteria for detecting lack of ----- control
a) physical b) statistical c) chemical d) biological

12. 𝑋̅ and R charts are employed to control the mean and --- respectively of the characteristic.
a) median b) mode c) S.D d) skewness

13. Shewhart's control chart for the number of defects per unit is used when the characteristic
representing the quality of a product is --- variable.
a) continuous b) uniform c) discrete d) exponential

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST


14. For a process to be working under statistical control, points both in the 𝑋̅ and R charts
should lie --- the control limits.
a) inside b) outside c) larger than d) between

15. If 'd' is the number of defectives in a sample of size n then the sample proportion defective
is ---
a) p  d b) p  d c) p  d d) p  d
n s n

16. A typical control chart consists of --- horizontal lines.


a) one b) two c) three d) four

17. In the control chart ,CL denotes the


a) last line b) central line c)double line d) first line

18. In the control chart, UCL denotes


a) least control line b) upper control limit c) lower control limit d) control line

19. In the control chart, LCL denotes the


a) first control line b) upper control limit c) lower control limit d) control line

20. In the control chart, the central line CL is plotted as a ------ line
a) dotted b) scattered c) empty d) bold

21. np - chart and p -chart are used when p ≥ 0.05 and n p ≥ ----
a) 1 b) 2 c) 3 d)4
22.c -chart is used when c ≥ ----
a) 1 b) 2 c) 3 d)4

UNIT -V 11. b
ANSWERS 12. c
1. d 13. c
2. a 14. d
3. b 15. a
4. a 16. c
5. b 17. b
6. d 18. b
7. a 19. c
8. b 20. d
9. c 21.d
10. a 22.d

Dr.N.BALAJI, Assistant Professor SG, Department of Mathematics, SRMIST


30. a. Fit a second degree parabola to the following data and also estimate the value for the year 1990. Reg. No.
Year 1955 1960 1965 1970 1975 1980 1985
Production (' 000 units) 6 8 9 10 t2 l1 8
B.Tech. DEGREE EXAMINATION, MAY 2018
(oR) First to Sixth Semester
b. Calculate the seasonal index using simple average method for the following data:
15MA3O5 - STATISTICS FOR INFORMATION TECHNOLOGY
Year I quarter II quarter III quarter IV quarter @or the candidates admiued during the academic year 2015 - 2016 onwords)
t990 72 68 80 70 (Statistical tables and quality control charts are to be provided. Graph sheet for time series quality analysis and
t991 76 70 82 74 statistical qualif connol are also to be provided)
1992 74 66 84 80
84 78 Note:
1993 76 74
t994 78 74 86 82
(i) Part - A should be answered in OMR sheet within first 45 minutes and OMR sheet should be handed over to
hall invigilator at the end of 456 minute.
(iD Part - B and Part - C should be answered in answer booklet.
31. a. Set up a two-way ANOVA table for the data given below:
Time: Three Hours Max. Marks: 100
Pieces of Treatment
fields A B C D PART-A(20x1-20Marks)
P 45 40 38 3t AnswerALL Questions
a 43 4t 45 38 1. The hannonic mean, arithmetic mean and geometric mean are all considered as
R 39 39 41 4l (A) Mathematical averages (B) Population averages
Also, test the significant diflerence between the pieces of fields and the treatnents. (C) Sample averages (D) Extended measures

(oR) 2. For the individual observations, the reciprocal of arithmetic mean is called
b. A teacher wishes to test three different leading methods I, II and III. To do this, the teacher (A) Geometric mean (B) Harmonic mean
chooses at random three groups of five students each and teaches each group by a different (C) Deviation square mean (D) Paired mean
- method. The same examination is then given to all the students and the marks obtained are given
below: Determine whether there is a significance difference between the teaching methods at 3. In a s)"rnmetric distribution
cr : 0.05. (A) Mean-Mode (B) Mean = Mode
Method I 78 62 7l 58 73 (C) Mean = Median: Mode (D) Mean: N4gdian + Mode
Method lI 76 85 77 90 87
Method III 74 79 60 75 80 4. The variability which is defined as t}e difference between third and fust quartile is considered as
(A) Quartile range (B) Deciles ra.nge
(C) Percentile range (D) Interquartilerange
32. a. 6iysn below are the values of sample mean -trland sample range R for 10 samples each of size 5.
Draw the appropriate mean and range charts and comment on the state of control of the process. 5. Ifbl and b: are regression coefficients, then the correlation coefficient is
Sample No. I 2 J 4 5 6 7 I 9 l0 (A) E G) q,+bz
Mean 43 49 37 44 45 37 51 46 43 47 b22
Range 5 6 5 7 7 4 8 6 4 6 (c) 4b2 to) JW
(oR) 6. When the correlation coefficient r -F 1, then regression lines
b. The specifications for a certain quahty characteristics are (60+ 24) in coded values. The table (A) Are perpendicular to each other (B) Coincide
given below gives the measurements obtained in 10 samples. Find the tolerance limits for the (C) Are parallel to each other (D) Do not exist
and test if the process meets the ons.
Specification 7. Correlation coeffrcient
Sample No
I 2 J 4 5 6 7 8 9 10 (A) Can take any value between -1 and +1 (B) Is always less then 1.

Measurements 75 48 57 61 55 49 74 67 66 62 (C) Is always greater than I @) Cannot be zero


( 7") 66 79 55 7t 68 98 63 70 65 68 8. Correlation coefficient is independent of
50 53 58 66 58 65 62 68 58 66 (A) Change of scale only (B) Change oforigin only
62 6l 6l 69 62 64 57 56 52 68 (C) Change in origin and scale (D) Neither change in origin nor change of scale
52 49 72 77 75 66 62 61 58 73
70 56 63 53 OJ 64 66 50 68 & 9. A time scores consist of
** (A) Two components (B) Three components
(C) Four components (D) Five eomponents

Page 4 of4 16MF1{/15MA305 Page I of4 r 6MF1-6/151\[4305


10. In a straight line equation V = a * bX, a is the
(A) X-intercep G) Slope
(C) Y-Intercept @) Normal zt. In a partially'destroyed record, the following data are available: Variance of X : 25, Regression
equation of X on Y: 5X-Y -.22 ard regression equation of Y on X : 64X -45Y = 24. Find
11. Depression in business is (i) Mean values ofX and Y (ii) coefficient of correlation between X and Y.
(A) Sectlar trend @) Cyclical
(C) Seasonal @) Inegular 24. Two salesmen are asked to rank 7 different types ofproducts. The ranks assigned by them are
given below: Calculate the S rank colTelanon
'ss ranK lafion coefficient.
12. In fitting of a straight line, the value of slope remains unchanged by change of Products A B C D E F G
(A) Scale (B) OriCm Salesman A I I 4 J 5 7 6
(C) Both origin and scale @) Terminal Salesman B I J 2 4 5 6 7

13. The mean of t-distribution is 25. Calculate the tend values by the method of moving averages assuming a four yearly cycle from
(A) 0 (B) 1 the followins
o data:
(c) -l (D) 2 Year 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 r99t 1992
Sugar
14. The assumptions in analysis of variance are the same as
Production 37.4 31.1 38.7 39.s 47.9 42.6 48.4 64.6 58.4 38.6 51.4 84.4
(A) Chi-square test (B) t-test (lakhs tonnes)
(C) F-test @) Median
26. Ten cartons are taken at mndom from an automatic filling machine. The mean net weight of
15. In RBD, the degrees of freedom for residual (error) is
cartons is 1 1.8 kg and the standard deviation is 0.1 5 kg. Does the sample mean differ significantly
(A) c-l (B) r-t from the intended weight of l2kg (Given v: 9,hos-- 2.26)
(C) (c-1)(r-1) (D) c-2
27. 15 taperecorders were examined for quality control test. The number of defects in each tape
16. The range ofF-distribution is
ate of contro
is siven bel o\ v. Draw the aonrooriate chart and comment on the state
recorder rs con o.l^
(A) 0 to co (B) -oo to co
Unit no. 2 ] 4 5 6 7 8 9 l0 l1 l2 13 l4 15
(C) -1 to oo (D) Itoo
1

No. of defects 2 4 1 I I 2 5 3 6 7 3 I 4 2 I
17. Quality control based on process production are classified into factors.
(A) One @) Two PART-C(5x12=60Marla)
(C) Three @) Four Aaswer ALL Questions

18. Control charts provide criteria for detecting lack of control.


28: a. Calculate the Pearson's coefficient ofskewness for the following data:
(A) Physical (B) Statistical Class lnterval 10-t9 20-29 30-39 40-49 50-59 60-69 70-79 80-89
(C) Chemical (D) Biological
l5
Frequency 5 9 t4 20 25 8 4

19. A typical conhol chart consists of horizontal lines.


(oR)
(A) One (B) Two
(C) The scores of two players A and B in 12 rounds are given below:
Three (D) Four
A 74 75 78 72 78 77 79 81 79 76 '12 71
20. In the control chart UCL denotes the B 87 84 80 88 89 85 86 82 82 79 86 80
(A) Least control line (B) Upper control limit
Identify the better player and the more consistent player.
(C) Lower control limit (D) Control line
29. a. CalcuTate the correlation coefficient for the following heights (in inches) of father (x) and their
PART-B(5x4:20Marks) sons (y).
Answer ANY FfVE Quesfions
x 65 66 67 67 68 69 70 72
21. Compute the geometric mean for the following details 2, 4,8, 12, 16,24. v 67 68 65 68 '12 72 69 7l

22. The weekly salaries of a group of employees are given in the following table. Find the (oR)
standard deviation of the salaries. b. Obtain the lines of regression for the following:

Salary (in O 75 80 85 90 95 100 x 25 28 35 36 36 29 38 34 32


No.of persons J 7 18 t2 6 4 v 43 46 49 4t 36 32 31 30 JJ 39

Page 2 of4 16MF1{/1sMA305 Page 3 of4 16MF1{/15MA305

Das könnte Ihnen auch gefallen