Beruflich Dokumente
Kultur Dokumente
The Shapiro-Wilk test (Shapiro & Wilk, 1965; Royston, 1995) and the Shapiro-Francia
test (Shapiro & Francia, 1972; Royston, 1993a) calculate a W and W' statistic, respectively,
that tests whether a random sample comes from a Normal distribution. Small values of W or
W' are evidence of departure from normality. The Shapiro-Wilk W statistic can only be
computed when sample size is between 3 and 5000 (inclusive) (Royston, 1995), the ShapiroFrancia W' statistic can be computed when sample size ranges from 5 to 5000 (Royston,
1993a & 1993b).
The D'Agostino-Pearson test (Sheskin, 2011) computes a single P-value for the combination
of the coefficients of Skewness and Kurtosis.
The Kolmogorov-Smirnov test (Neter et al., 1988) with Lilliefors significance correction
(Dallal & Wilkinson, 1986) is based on the greatest discrepancy between the sample
cumulative distribution and the Normal cumulative distribution.
The Chi-squared goodness-of-fit test is applied to binned data (the data are put into
classes) (Snedecor & Cochran, 1989) and requires a larger sample size than the other two
tests.
Calculate SS as follows:
1
in the Shapiro-Wilk Tables. Note that if n is odd, the median data value is not used in the
calculation of b.
Find the value in the Table 2 of the Shapiro-Wilk Tables (for a given value of n) that
We begin by sorting the data in column A using Data > Sort & Filter|Sort or
theQSORT supplemental function, putting the results in column B. We next look up the
coefficient values for n = 12 (the sample size) in Table 1 of the Shapiro-Wilk Tables, putting
these values in column E.
Corresponding to each of these 6 coefficients a ,,a , we calculate the values x x , ,x x ,
where x is the ith data element in sorted order. E.g. since x = 35 and x = 86, we place the
difference 86 35 = 51 in cell H5 (the same row as the cell containing a ). Column I contains
the product of the coefficients and difference values. E.g. cell I5 contains the formula
=E5*H5. The sum of these values is b = 44.1641, which is found in cell I11 (and again in cell
E14).
We
next
calculate SS as
DEVSQ(B4:B15)
=
2008.667.
Thus W = b SS =
44.1641^2/2008.667 = .971026. We now look for .971026 when n = 12 in Table 2 of
the Shapiro-Wilk Tables and find that the p-value lies between .50 and .90. The W value
for .5 is .943 and the W value for .9 is .973.
1
12
12
Interpolating .971026 between these value (using linear interpolation), we arrive at p-value
= .873681. Since p-value = .87 > .05 = , we retain the null hypothesis that the data are
normally distributed.
Example 2: Using the SW test, determine whether the data in Example 1 of Graphical Tests
for Normality and Symmetry are normally distributed.
As we can see from the analysis in Figure 2, p-value = .0419 < .05 = , and so we reject the
null hypothesis and conclude with 95% confidence that that the data are not normally
distributed, which is quite different from the results using the KS test that we found in
Example 2 of Kolmogorov-Smironov Test.
Real Statistics Function: The Real Statistics Resource Pack contains the following
supplemental functions where R1 consists only of numeric data without headings:
SHAPIRO(R1, FALSE) = the Shapiro-Wilk test statistic W for the data in the range R1
SWTEST(R1, FALSE, h) = p-value of the Shapiro-Wilk test on the data in R1
SWCoeff(n, j, FALSE) = the jth coefficient for samples of size n
SWCoeff(R1, C1, FALSE) = the coefficient corresponding to cell C1 within sorted range R1
SWPROB(n, W, FALSE, h) = p-value of the Shapiro-Wilk test for a sample of size nfor test
statistic W
The functions SHAPIRO and SWTEST ignore all empty and non-numeric cells. The range
R1 in SWCoeff(R1, C1, FALSE) should not contain any empty or non-numeric cells.
When performing the table lookup, the default is to use harmonic interpolation (h = TRUE).
To use linear interpolation, set h to FALSE. See Interpolation for details.
For example, for Example 1 of Chi-square Test for Normality, we have SHAPIRO(A4:A15,
FALSE) = .874 and SWTEST(A4:A15, FALSE, FALSE) = SWPROB(15,.874,FALSE,FALSE) =
.0419 (referring to the worksheet in Figure 2 of Chi-square Test for Normality).
It is important to note that SHAPIRO(R1, TRUE), SWTEST(R1, TRUE), SWCoeff(n, j,
TRUE), SWCoeff(R1, C1, TRUE) and SWPROB(n, W, TRUE) refer to the results using the
Royston algorithm, as described in Shapiro-Wilk Expanded Test.
For compatibility with the Royston version of SWCoeff, when j n/2 then SWCoeff(n, j,
False) = the negative of the value of the jth coefficient for samples of size n found in
the Shapiro-Wilk Tables. When j = (n+1)/2, SWCoeff(n, j, FALSE) = 0 and when j >
(n+1)/2, SWCoeff(n, j, FALSE) = -SWCoeff(n, nj+1, FALSE).
VASSARSTAT
STATA
MEDCALC
CONTINGENCY COEFFICIENT
If we look at the contingency table of two uncorrelated nominal variables, we can calculate
the frequency of a particular combination of features hij as
hik = hihk/N
In the case of a correlation of the two variables the actual frequencies Hik will deviate from the ideal
uncorrelated frequencies hik. The difference Dik between ideal (uncorrelated) und actual frequencies
thus calculates as
Dik = Hik - hik = Hik - hihk/N.
For uncorrelated variables the difference of frequencies will be around zero for each cell of the table.
Thus the correlation of the two variables can be measured by squaring the relative differences and
calculating the sum of these squares in relation to the ideal frequencies:
The resulting 2 coefficient, however, has the disadvantage that its value depends both on the
dimension of the contingency table and on the size of the sample. After eliminating the dependence
on the sample size, we get Pearson's contingency coefficient C:
As this coefficient C is still depending on the dimension of the contingency table, it will be normalized
so that its range extends from 0.0 to 1.0:
In contrast to the correlation coefficient the corrected contingency coefficient Ccorr does
not indicate the direction of the correlation but only its strength.
Z Score Formula
Z score is the outcome obtained when Z test is performed. Z test follows normal distribution under null hypothesis. Z score is
calculated for a large number of data. To estimate Z score, we require a variable which is known as standardized random
variable. This variable is denoted by x. We can find Z score when mean and standard deviation are known. We subtract the
value of mean from standardized random variable and then divide the result by the value of standard deviation. The formula
for calculating Z score is given below:
Where,
x = Standardized random variable
x = Mean
= Population standard deviation.
Following is the formula for population standard deviation:
Where,
= Population standard deviation
xi = Items given
x = Mean
n = Total number of items.
Z Score Problems
Back to Top
Solved Examples
Question 1: The scores of an exam are recorded. Mean and standard deviation of the marks are 263 and 45 respectively. If
scores are distributed normally, what would be the Z score for obtaining 350?
Solution:
Z score = xx
Z score = 35026345
= 1.933
Question 2: A student wrote 2 quizzes. In first quiz, he scored 80 and in other, he scored 75. The mean and standard
deviation of first quiz are70 and 15 respectively, while the mean and standard deviation of second quiz are 54 and 12
respectively. The results follow normal distribution. What can you conclude about the student's result by seeing their z
scores?
Solution:
Calculation of student's Z score for first quiz:
Standardized random variable x = 80
Mean x = 70
Population standard deviation = 15
Formula for Z score is given below:
Z score = xx
Z score = 807015
= 0.667
Calculation of student's Z score for second quiz:
Standardized random variable x = 75
Mean x = 54
Population standard deviation = 12
Formula for Z score is given below:
Z score = xx
Z score = 755412
= 1.75
Since Z score of second quiz is better than that of first quiz, hence it is concluded that he did better in second quiz.
Here,
SS = Sample size.
Z = Given z value
p = Percentage of population
C = Confidence level
Pop = Population
Solved Examples
Question 1: Find the Sample size for finite and infinite population, when percentage of 4300 population is 5, confidence
level 95 and confidence interval is 0.04?
Solution:
From the given data:
Z = 3.8416 ( from the z table, we the value of confidence level, that is 1.96)
by applying given data in the formula
SS = Z2p(1p)C2
SS = (1.96)20.5(10.5)0.042 = 600.25
SS=600 (after rounding to nearest whole numbers)
Now lets calculate the sample size for the finite population.
New SS = SS1+(SS1Pop)
New SS = 6001+(60014300)
New SS = 527
Question 2: Find the Sample size for finite and infinite population using the given data below, when percentage of 7800
population is 5, confidence level 90 and confidence interval is 0.04?
Solution:
From the given data:
Z= 2.7060( from the z table, we the value of confidence level, that is 1.645)
by applying given data in the formula
SS = Z2p(1p)C2
SS = (1.645)20.5(10.5)0.042 = 422.812
SS = 423 (after rounding to nearest whole numbers)
Now lets calculate the sample size for the finite population.
New SS = SS1+(SS1Pop)
New SS = 4231+(42317800) = 401.28
New SS = 401 (after rounding to nearest whole numbers)
and
Where,
n = Number of terms
x = Sample Mean
= Standard Deviation
Back to Top
Solved Examples
Question 1: Average score obtained by 10 students in a test is 25. Standard deviation of the sample is 8. Find the
confidence interval at 95% confidence level?
Solution:
n = 10
Mean x = 25
Standard deviation = 8
Confidence level = 95% = 0.95
=1 - confidence level100
=1- 95100 = 0.05
Formula for confidence interval:
Confidence interval=xt2(n)
Confidence interval=25t0.052(810)
= 252.09302(810)
=19.28 to 30.72
Question 2: A survey was performed on 50 families asking how many members are there in the family. The mean and
standard deviation of the survey was 6 and 5 respectively. If confidence level is 95%, what would be the confidence interval
of the survey?
Solution:
n = 50
Mean x = 6
Standard deviation = 5
Confidence level = 95% = 0.95
=1 - confidence level100
=1 - 95100 = 0.05
Formula for confidence interval:
= 60.51(510)
= 4.579 to 7.421
Here,
O = Observed frequency
E = Expected frequency
= Summation
X2 = Chi Square value
Solved Examples
Question 1: Calculate the chi-square value, if observed frequency is 8 and expected frequency is 15?
Solution:
Given:
Observed frequency = 8
Expected frequency = 15
X2 = (OE)2E
X2 = (815)215
X2 = (7)215
X2 = 4915
X2 = 3.267
Chi Square value = 3.267
Color
Red
Green
Yellow
Observed frequency
12
16
20
Expected frequency
16
25
Solution:
Lets find the Chi Square value for the given data using the formula :
X2 = (OE)2E
First, lets calculate (OE)2 for each color.
Red color = (OE)2 = (1216)2 = 16
Green color = (OE)2 = (168)2 = 64
Z Test Formula
Z Test is a concept of statistics which compares means of two populations. Z test assumes normal distribution under null
hypothesis. Z test is performed on a large number of data or on a population data. On the other hand, for a small data or
sample data, T test is performed. The score determined by Z test is called "Z score". Z score can be approximated when
population standard deviation of a large data is given. Z test uses an assumed value which is generally within the limits of
given data to calculate Z score. This value is known as "standardized random variable".
The formula for calculating Z score is given below:
Where,
x = Standardized random variable
Where,
= Population standard deviation
xi = Numbers given in the data
Z Test Problems
Back to Top
Solved Examples
Question 1: In a government organization, the mean basic salary of the employees is INR 5000. How much will be the z
score of employees whose basic salary is INR 3000, if standard deviation of the population is 850?
Solution:
Standardized random variable x = 3000
Mean x = 5000
Population standard deviation = 850
Formula for Z score is given below:
Z score = xx
Z score = 30005000850
= -2.353
Question 2: The marks obtained in mathematics exam by students of a class vary from 33 to 100. If average marks are 62
and standard deviation is 20. Find the Z score for students who scored 80?
Solution:
Z score = xx
Z score = 806220
= 0.9
F Test Formula
F Test is a method to compare variance of two different set of values. F test is applied on F distribution under null
hypothesis. For calculating F test value, we first find the mean of two given observations and then calculate their variance.
F test value is expressed as the ratio of variances of two observations. The comparison between the variances of two sets of
data can lead to many predictions. The formula for F test is given below:
Where,
2 = Variance
x = Values given in a set of data
F Test Problems
Back to Top
Solved Examples
Question 1: Find the F value for the following two observations:
1, 3, 5, 7, 9 and 5 ,9, 3, 8, 3?
Solution:
Formula for mean is given by:
x = xn
Formula for variance is given by:
2 = (xx)2n1
Calculation for first set of values:
x1 = 5
x1
x1x1
(x1x1)2
-4
16
-2
16
(x1x1)2 = 40
12 = 10
Calculation for second set of values:
x2 = 5
x2
x2x2
(x2x)2
0.6
0.36
3.4
11.56
-2.6
6.76
2.4
5.76
-2.6
6.76
(x1x1)2 =31.2
22 = 7.8
F value = 2122
F value = 107.8 = 1.282
x = xn
Formula for variance is given by:
2 = (xx)2n1
Calculation for first set of values:
x1 =3
x1
x1x1
(x1x1)2
-1
-2
(x1x1)2 = 10
12 = 2.5
Calculation for second set of values:
x2 = 4.2
x2
x2x2
(x2x)2
3.8
14.44
-1.2
1.44
4.8
23.04
-4.2
17.64
-3.2
10.24
(x1x1)2 =66.8
22 = 16.7
F value = 2122
F value = 2.516.7
= 0.1497 = 0.15 (approx)
T Test Formula
T Test is often called Student's T test in the name of its founder "Student". T test is used to compare two different set of
values. It is generally performed on a small set of data. T test is generally applied to normal distribution which has a small
set of values. This test compares the mean of two samples. T test uses means and standard deviations of two samples to
make a comparison. The formula for T test is given below:
Where,
Where,
x = Values given
x = Mean
n = Total number of values.
T Test Problems
Back to Top
Solved Examples
Question 1: Find the t-test value for the following two sets of values:
7, 2, 9, 8 and 1, 2, 3, 4?
Solution:
Formula for mean:
x = xn
Formula for standard deviation:
S=(xx)2n1
Calculation for first set:
Number of terms in first set:
n1 = 4
Mean for first set of data:
x1 = 6.5
Construct the following table for standard deviation:
x1
x1x1
(x1x1)2
0.5
0.25
-4.5
20.25
2.5
6.25
1.5
2.25
(x1x1)2 = 29
x2 = 2.5
Construct the following table for standard deviation:
x2
x2x2
(x2x2)2
-1.5
2.25
-0.5
0.25
0.5
0.25
1.5
2.25
(x2x2)2 = 5
t = x1x2S21n1+S22n2
t = 6.52.59.6674+1.6674
t = 2.3764 = 2.38 (approx)
Question 2: Find the t-test value for the following two sets of data:
x1
10
11
12
x2
Solution:
Formula for mean:
x = xn
Formula for standard deviation:
S = (xx)2n1
Calculation for first set:
Number of terms in first set:
n1 = 4
Mean for first set of data:
x1 = 10.5
Construct the following table for standard deviation:
x1
x1x1
(x1x1)2
-1.5
2.25
10
-0.5
0.25
11
0.5
0.25
12
1.5
2.25
(x1x1)2 = 5
x2 = 5
Construct the following table for standard deviation:
x2
x2x2
(x2x2)2
-3
-1
9
(x2x2)2 = 20
t = x1x2S21n1+S22n2
t = 10.551.6674+6.6674
t = 3.8105 = 3.81 (approx)
where,
X and Y are two variables on regression line.
Solved Examples
Question 1: Find linear regression equation for the following two sets of data:
10
Solution:
Construct the following table:
x2
xy
16
28
36
30
10
64
80
x = 20
y = 25
x2 = 120
xy = 144
b = nxy(x)(y)nx2(x)2
b = 414420254120400
b = 0.95
a = yb(x)n
a = 260.95204
a = 1.5
Linear regression is given by:
y = a + bx
y = 1.5 + 0.95x
Solution:
Construct the following table:
x2
xy
16
16
12
x = 10
y = 20
x2 = 30
xy = 43
b = nxy(x)(y)nx2(x)2
b = 4431020430100
b = -1.4
a = yb(x)n
a = 20+1.4104
a = 8.5
Linear regression is given by:
y = a + bx
y = 8.5 - 1.4x
Top
Where,
f = Frequency of an individual item
n = Total frequencies.
Solved Examples
Question 1: Construct the relative frequency table for the following data:
2, 2, 7, 4, 5, 1, 5, 7, 9, 0, 1, 3, 2, 5, 9, 6, 2, 3, 7, 0
Solution:
Relative frequency = fn
220 = 0.1
220 = 0.1
420 = 0.2
220 = 0.2
120 = 0.05
320 = 0.15
120 = 0.1
320 = 0.15
220 = 0.1
n = 20
Class Interval
Frequency
0-10
10-20
20-30
10
30-40
15
40-50
Solution:
Class Interval
Frequency
Relative frequency = fn
0-10
545 = 0.111
10-20
845 = 0.178
20-30
10
1045 = 0.222
30-40
15
1545 = 0.333
40-50
745 = 0.156
n = 45
Here,
$z$ $(\frac{\alpha
Solved Examples
Question 1: A random sample of 30 students has an average yearly earnings of 2450 and a standard deviation of 587. Find
the margin of error if c = 0.95?
Solution:
Given n=30
Standard Deviation= 587
z $\frac{\alpha}{2}$ = 1.96
$\therefore$ by using the formula E= z $(\frac{\alpha
= 1.96
$(\frac{587}{\sqrt{30}})$
= 210.06
= 210 approximately
The margin of error = 210
}{2})(\frac{\sigma }{\sqrt{n}})$
Question 2: An average monthly earning of 56 employees in a company is 4685 and a standard deviation is 354. Find the
margin of error if c= 0.95?
Solution:
Given n = 56
Standard Deviation= 354
z $\frac{\alpha}{2}$ = 1.96
$\therefore$ by using the formula E = z $(\frac{\alpha
= 1.96 $(\frac{354}{\sqrt{56}})$
= 92.72
= 93 approximately
The margin of error = 93
}{2})(\frac{\sigma }{\sqrt{n}})$
ANOVA
Anova is a statistical test which analyzes variance. It is helpful in making comparison of two or more means which enables a
researcher to draw various results and predictions about two or more sets of data. Anova test includes one-way anova, twoway anova or multiple anova depending upon the type and arrangement of the data. One-way anova has the following test
statistics:
Where,
F = Anova Coefficient
MST = Mean sum of squares due to treatment
MSE = Mean sum of squares due to error.
Formula for MST is given below:
Where,
SST = Sum of squares due to treatment
p = Total number of populations
n = Total number of samples in a population.
Formula for MSE is given below:
Where,
SSE = Sum of squares due to error
S = Standard deviation of the samples
N = Total number of observations.
Solved Examples
Question 1: Following data is given about cricket teams of three countries:
Countries
Number of Players
Average Runs
Standard Deviations
India
11
60
15
New Zealand
11
50
10
South Africa
11
70
12
Cricket Teams
S2
India
11
60
15
225
New Zealand
11
50
10
100
South Africa
11
70
12
144
n = 11
p=3
N = 33
x = 60+50+703 = 60
SST=n(xx)2
SST=11(6060)2+11(5060)2+11(7060)2
= 2200
MST = SSTp1
MST = 220031
= 1100
SSE=(n1)S2
SSE = 10*225 + 10*100 + 10*144
= 4690
MSE = SSENp
MSE = 4690333
MSE = 156.33
F = MSTMSE
F = 1100156.33
= 7.036
Plant Name
Number of plants
Average Flowers
Standard Deviation
Rose
12
Marigold
16
Lily
20
Plant name
S2
Rose
12
Marigold
16
Lily
20
16
p=3
n=5
N = 15
x = 16
SST = n(xx)2
SST = 5(1216)2+5(1616)2+11(2016)2
= 160
MST = SSTp1
MST = 16031
= 80
SSE = (n1)S2
SSE = 4*4 + 4*1 + 4*16
= 84
MSE = SSENp
MSE = 84153
MSE = 7
F = MSTMSE
F = 807
= 11.429
Pearson correlation coefficient for sample data is denoted by "r". The formula for Pearson correlation
coefficient r is given by:
Where,
r = Pearson correlation coefficient
x = Values in first set of data
y = Values in second set of data
n = Total number of values.
The few problems based on Pearson correlation coefficient are listed below:
Solved Examples
Question 1: Marks obtained by 5 students in algebra and trigonometry as given below:
Algebra
15
16
12
10
Trigonometry
18
11
10
20
17
x2
y2
xy
15
18
225
324
270
16
11
256
121
176
12
10
144
100
120
10
20
100
400
200
17
64
289
136
x = 61
y = 76
x2 = 789
y2 = 1234
xy = 902
x2
y2
xy
25
49
35
36
18
16
81
36
64
16
x = 17
y = 27
x2 = 81
y2 = 203
xy = 105
Where,
r = Correlation coefficient
x = Values in first set of data
y = Values in second set of data
n = Total number of values.
Back to Top
Solved Examples
Question 1: Marks obtained by few students in physics and chemistry tests are given by the following table:
Physics
18
16
15
10
Chemistry
15
12
17
x2
y2
xy
18
15
324
225
270
16
12
256
144
192
15
225
81
135
10
17
100
289
170
x = 59
y = 53
x2 = 905
y2 = 739
xy = 767
r = n(xy)(x)(y)[nx2(x)2][ny2(y)2]
r = 47675953[4905(59)2][4739(53)2]
r = 0.41275
Coefficient of determination = r2
= 0.17036 = 0.17 (approx).
Question 2: During an observation at a garden, the number of roses and the number of marigold flowers are noted every
week. The readings of 4 successive weeks are as follows:
Roses
Marigold
3
5
x2
y2
xy
25
15
16
36
24
x = 10
y = 16
x2 = 30
y2 = 74
xy = 46
r = n(xy)(x)(y)[nx2(x)2][ny2(y)2]
r = 4461016[430(10)2][474(16)2]
r = 0.8485
Coefficient of determination = r2
= 0.7199 = 0.72 (approx).
Variance Formula
Variance is one of the most important concepts of statistics. It is the measure of dispersion of a set of data. It indicates how
far the different values of a set of data are spread over. Variance is the measurement of deflection of values from its mean.
Variance is the average of squared differences of each value from mean of the data.
There is one difference between variance and standard deviation that standard deviation is defined as the square root of
variance. Variance is denoted by square of a Greek letter sigma ($\sigma ^{2}$). The formulas for variance are given below.
Variance formula for population data is as follows:
Where,
$\sigma ^{2}$ = Variance
x = Item given in the data
$\bar{x}$ = Mean of the data
n = Total number of items.
Variance Problems
Back to Top
Solved Examples
Question 1: Marks obtained by few students are: 75, 83, 54, 90, 61. Find the variance of the sample?
Solution:
Formula for mean:
$\bar{x}$ = $\frac{\sum
x}{n}$
$\bar{x}$ = $\frac{363}{5}$ = 72.6
Construct the following table:
$x-\bar{x}$
$(x-\bar{x})^{2}$
75
2.4
5.76
83
10.4
108.16
54
-18.6
345.96
90
17.4
302.76
61
-11.6
134.56
$\sum x$ = 363
(x-\bar{x})^{2}}{n-1}$
Question 2: During a survey, the number of members in the family were asked. Few answers of them are 5, 6, 2, 3, 1, 7, 4,
8. Calculate the variance.
Solution:
$x-\bar{x}$
$(x-\bar{x})^{2}$
0.5
0.25
1.5
2.25
-2.5
6.25
-1.5
2.25
-3.5
12.25
2.5
6.25
-0.5
0.25
3.5
12.25
$\sum x$ =36
$\sum (x-\bar{x})^{2}$ = 42
(x-\bar{x})^{2}}{n-1}$
$\sigma ^{2}$ = $\frac{42}{7}$
=6
If x and y are the two variables, then the formula for correlation coefficient is given as:
Here,
n= Number of data.
x= Sum of first data list.
y= Sum of second data list.
xy =Sum of the product of 1st and 2nd value.
x2 = Sum of the squares of the 1st value.
y2 = Sum of the squares of the 2nd value.
Solved Examples
Question 1: Calculate the Correlation Coefficient for the below table:
x values
y values
60
3.1
61
3.6
62
3.8
63
Solution:
Given n= 4
Firstly, lets calculate xy, x2 and y2
x values
y values
xy
y2
3600
9.61
60
3.1
61
3.6
219.6
3721
12.96
62
3.8
235.6
3844
14.44
63
252
3969
16
186
x2
r= n(xy)(x)(y)[nx2(x)2][ny2(y)2]
= 4(893.2)(246)(14.5)[4(15134)(246)2][4(53.01)(14.5)2]
Correlation Coefficient (r)= 0.96936
x value
y value
100
28
106
33
112
26
98
27
87
24
77
24
67
21
66
26
49
22
x value
y value
xy
x2
y2
100
28
2800
10000
784
106
33
3498
11236
1089
112
26
2912
12455
676
98
27
2646
9604
729
87
24
2088
7569
576
77
24
1848
5929
576
67
21
1407
4489
441
66
26
1416
4356
676
49
22
1078
2401
484
r= n(xy)(x)(y)[nx2(x)2][ny2(y)2]
= 9(19693)(762)(231)[9(68039)(762)2][9(6031)(231)2]
Correlation Coefficient (r) = 0.716664
Quartile Formula
Just like the median divides the set of observation into two equal parts when arranged in the numerical order, in the same
way quartile divides the set of observation into 4 equal parts.
The value of the middle term, between the first term and median is known as first or Lower Quartile and is denoted as Q1.
The value of middle term between the last term and the median is known as third or Upper Quartile and is denoted
as Q3.The median itself is known as the Second Quartile and is denoted as Q2
When the set of observation is arranged in an ascending order, then the Lower quartile is given as:
If the solution is a decimal number then, Lower quartile Q1 is given by rounding to the nearest whole interger.
The Second quartile, which is the median of the set of observation is given as:
If the solution is a decimal number then, Upper quartile Q3 is given by rounding to the nearest whole interger.
The lower and the upper quartile value helps us to find the measure of dispersion in the set of observation, which is called
as 'inter-quartile range', it is denoted as IQR and it is the difference between upper and lower quartile.
Quartile Problems
Back to Top
Solved Examples
Question 1: Find the median, lower quartile, upper quartile and inter-quartile range of the following data set of scores: 19,
22, 24, 20, 24, 27, 25, 24, 30?
Solution:
First, lets arrange of the values in an ascending order:
19, 20, 22, 24, 24, 24, 25, 27, 30
Now lets calculate the Median,
Median = $\left(\frac{n + 1}{2} \right)^{th}$ term
= $\left(\frac{9 + 1}{2} \right)^{th}$ term
= $5^{th}term$
= 24
Lower quartile = $\left(\frac{n + 1}{4} \right)^{th}$ term
= $\left(\frac{9 + 1}{4} \right)^{th}$ term
= $\left(\frac{10}{4} \right)^{th}$ term
= $2.5^{th}$
Find the average of 2nd and 3rd term
= $\frac{20
+ 22}{2}$
= $\frac{42}{2}$
= 21
Upper quartile = $\left(\frac{3(n + 1)}{4} \right)^{th}$
= $\left(\frac{3(9 + 1)}{4} \right)^{th}$
= $\left(\frac{3(10)}{4} \right)^{th}$
= $\left(\frac{30}{4} \right)^{th}$
= $7.5^{th}$
(lets find the average of 7th and 8th term)
= $\frac{25
+ 27}{2}$
= $\frac{52}{2}$
= 26
Inter - quartile= Upper quartile - lower quartile
= 26 - 21
=5
Question 2: Find the first quartile, second quartile and third quartile of the given information of the following sequence 4, 77,
16, 59, 93, 88?
Solution:
First, lets arrange of the values in an ascending order:
4, 16, 59, 77, 88, 93
Given n = 6
$\therefore$ Lower quartile = $\left(\frac{n + 1}{4} \right)^{th}$ term
= $\left(\frac{6 + 1}{4} \right)^{th}$ term
= $\left(\frac{7}{4} \right)^{th}$ term
= 1.7th term
Here we can consider the 2nd term (rounding 1.7 to nearest whole integer) from the set of observation.
$\Rightarrow$ 2nd term = 16
Lower quartile = 16
Upper quartile = $\left(\frac{3(n + 1)}{4} \right)^{th}$ term
= $\left(\frac{3(6 + 1)}{4} \right)^{th}$ term
= $\left(\frac{21}{4} \right)^{th}$ term
= $5.25^{th}$
Here we can consider the 5thterm (rounding 5.25 to nearest whole integer) from the set of observation.
$\Rightarrow$ $5.25^{th}$ = 88
Upper quartile= 88
Inter-quartile= Upper quartile - lower quartile
= 88 - 16
= 72
Sometimes, only a sample of whole population is given. In this case instead of calculating population standard deviation, we
calculate sample standard deviation. The formula for sample standard deviation is given below:
Where,
xi = Terms given in the data
x = Mean
n = Total number of terms.
Solved Examples
Question 1: During a survey, 6 students were asked that how many hours per day they study on an average? Their answers
were as follows: 2, 6, 5, 3, 4, 1. Evaluate the standard deviation.
Solution:
Formula for mean is given by:
x = xin
x = 2+6+5+3+4+16
= 3.5
Construct the following table for standard deviation:
xi
xix
(xix)2
-1.5
2.25
2.5
6.25
1.5
2.25
-0.5
0.25
0.5
0.25
-2.5
6.25
(xix)2 = 17.5
S = ni=1(xix)2n
S=17.56
S = 2.92 = 1.71
Question 2: Marks obtained by 4 students in a class are 25, 15, 20, 18. Find the standard deviation of the sample?
Solution:
Formula for mean is given by:
x = ni=1xin
x = 25+15+20+184
= 19.5
Construct the following table for standard deviation:
xi
xix
(xix)2
25
5.5
30.25
15
-4.5
20.25
20
0.5
0.25
18
-1.5
2.25
(xix)2 = 53
S = ni=1(xix)2n1
S=533
S = 4.2