Statistical Tools and Formula

Tests available in MedCalc
MedCalc offers the following tests for Normal distribution:
The Shapiro-Wilk test (Shapiro & Wilk, 1965; Royston, 1995) and the Shapiro-Francia
test (Shapiro & Francia, 1972; Royston, 1993a) calculate a W and W' statistic, respectively,
that tests whether a random sample comes from a Normal distribution. Small values of W or
W' are evidence of departure from normality. The Shapiro-Wilk W statistic can only be
computed when sample size is between 3 and 5000 (inclusive) (Royston, 1995), the ShapiroFrancia W' statistic can be computed when sample size ranges from 5 to 5000 (Royston,
1993a & 1993b).
The D'Agostino-Pearson test (Sheskin, 2011) computes a single P-value for the combination
of the coefficients of Skewness and Kurtosis.
The Kolmogorov-Smirnov test (Neter et al., 1988) with Lilliefors significance correction
(Dallal & Wilkinson, 1986) is based on the greatest discrepancy between the sample
cumulative distribution and the Normal cumulative distribution.
The Chi-squared goodness-of-fit test is applied to binned data (the data are put into
classes) (Snedecor & Cochran, 1989) and requires a larger sample size than the other two
tests.
SHAPIRO-WILK ORIGINAL TEST

We present the original approach to the performing the Shapiro-Wilk Test. This approach is
limited to samples between 3 and 50 elements. By clicking here you can also review a
revised approach using the algorithm of J. P. Royston which can handle samples with up to
5,000 (or even more).
The basic approach used in the Shapiro-Wilk (SW) test for normality is as follows:
Rearrange the data in ascending order so that x x .
Calculate SS as follows:
1
If n is even, let m = n/2, while if n is odd let m = (n1)/2

Calculate b as follows, taking the a weights from the Table 1 (based on the value of n)
i
in the Shapiro-Wilk Tables. Note that if n is odd, the median data value is not used in the
calculation of b.
Calculate the test statistic W = b SS

2
Find the value in the Table 2 of the Shapiro-Wilk Tables (for a given value of n) that
is closest to W, interpolating if necessary. This is the p-value for the test.

For example, suppose W = .975 and n = 10. Based on Table 2 of the Shapiro-Wilk Tables the
p-value for the test is somewhere between .90 (W = .972) and .95 (W = .978).
Example 1: A random sample of 12 people is taken from a large population. The ages of the
people in the sample are given in column A of the worksheet in Figure 1. Is this data
normally distributed?
Figure 1 Shapiro-Wilk test for Example 1
We begin by sorting the data in column A using Data > Sort & Filter|Sort or
theQSORT supplemental function, putting the results in column B. We next look up the
coefficient values for n = 12 (the sample size) in Table 1 of the Shapiro-Wilk Tables, putting
these values in column E.
Corresponding to each of these 6 coefficients a ,,a , we calculate the values x x , ,x x ,
where x is the ith data element in sorted order. E.g. since x = 35 and x = 86, we place the
difference 86 35 = 51 in cell H5 (the same row as the cell containing a ). Column I contains
the product of the coefficients and difference values. E.g. cell I5 contains the formula
=E5*H5. The sum of these values is b = 44.1641, which is found in cell I11 (and again in cell
E14).
We
next
calculate SS as
DEVSQ(B4:B15)
=
2008.667.
Thus W = b SS =
44.1641^2/2008.667 = .971026. We now look for .971026 when n = 12 in Table 2 of
the Shapiro-Wilk Tables and find that the p-value lies between .50 and .90. The W value
for .5 is .943 and the W value for .9 is .973.
1
12
12
Interpolating .971026 between these value (using linear interpolation), we arrive at p-value
= .873681. Since p-value = .87 > .05 = , we retain the null hypothesis that the data are
normally distributed.
Example 2: Using the SW test, determine whether the data in Example 1 of Graphical Tests
for Normality and Symmetry are normally distributed.
Figure 2 Shapiro-Wilk test for Example 2
As we can see from the analysis in Figure 2, p-value = .0419 < .05 = , and so we reject the
null hypothesis and conclude with 95% confidence that that the data are not normally
distributed, which is quite different from the results using the KS test that we found in
Example 2 of Kolmogorov-Smironov Test.
Real Statistics Function: The Real Statistics Resource Pack contains the following
supplemental functions where R1 consists only of numeric data without headings:
SHAPIRO(R1, FALSE) = the Shapiro-Wilk test statistic W for the data in the range R1
SWTEST(R1, FALSE, h) = p-value of the Shapiro-Wilk test on the data in R1
SWCoeff(n, j, FALSE) = the jth coefficient for samples of size n
SWCoeff(R1, C1, FALSE) = the coefficient corresponding to cell C1 within sorted range R1
SWPROB(n, W, FALSE, h) = p-value of the Shapiro-Wilk test for a sample of size nfor test
statistic W
The functions SHAPIRO and SWTEST ignore all empty and non-numeric cells. The range
R1 in SWCoeff(R1, C1, FALSE) should not contain any empty or non-numeric cells.
When performing the table lookup, the default is to use harmonic interpolation (h = TRUE).
To use linear interpolation, set h to FALSE. See Interpolation for details.
For example, for Example 1 of Chi-square Test for Normality, we have SHAPIRO(A4:A15,
FALSE) = .874 and SWTEST(A4:A15, FALSE, FALSE) = SWPROB(15,.874,FALSE,FALSE) =
.0419 (referring to the worksheet in Figure 2 of Chi-square Test for Normality).
It is important to note that SHAPIRO(R1, TRUE), SWTEST(R1, TRUE), SWCoeff(n, j,
TRUE), SWCoeff(R1, C1, TRUE) and SWPROB(n, W, TRUE) refer to the results using the
Royston algorithm, as described in Shapiro-Wilk Expanded Test.
For compatibility with the Royston version of SWCoeff, when j n/2 then SWCoeff(n, j,
False) = the negative of the value of the jth coefficient for samples of size n found in
the Shapiro-Wilk Tables. When j = (n+1)/2, SWCoeff(n, j, FALSE) = 0 and when j >
(n+1)/2, SWCoeff(n, j, FALSE) = -SWCoeff(n, nj+1, FALSE).
VASSARSTAT
STATA
MEDCALC
CONTINGENCY COEFFICIENT
If we look at the contingency table of two uncorrelated nominal variables, we can calculate
the frequency of a particular combination of features hij as
hik = hihk/N
In the case of a correlation of the two variables the actual frequencies Hik will deviate from the ideal
uncorrelated frequencies hik. The difference Dik between ideal (uncorrelated) und actual frequencies
thus calculates as
Dik = Hik - hik = Hik - hihk/N.
For uncorrelated variables the difference of frequencies will be around zero for each cell of the table.
Thus the correlation of the two variables can be measured by squaring the relative differences and
calculating the sum of these squares in relation to the ideal frequencies:
The resulting 2 coefficient, however, has the disadvantage that its value depends both on the
dimension of the contingency table and on the size of the sample. After eliminating the dependence
on the sample size, we get Pearson's contingency coefficient C:
As this coefficient C is still depending on the dimension of the contingency table, it will be normalized
so that its range extends from 0.0 to 1.0:
with mmin = min(q,p).

Hint:
In contrast to the correlation coefficient the corrected contingency coefficient Ccorr does
not indicate the direction of the correlation but only its strength.
Z Score Formula
Z score is the outcome obtained when Z test is performed. Z test follows normal distribution under null hypothesis. Z score is
calculated for a large number of data. To estimate Z score, we require a variable which is known as standardized random
variable. This variable is denoted by x. We can find Z score when mean and standard deviation are known. We subtract the
value of mean from standardized random variable and then divide the result by the value of standard deviation. The formula
for calculating Z score is given below:
Where,
x = Standardized random variable
x = Mean
= Population standard deviation.
Following is the formula for population standard deviation:
Where,
= Population standard deviation
xi = Items given
x = Mean
n = Total number of items.
Z Score Problems
Back to Top
Following are the few problems based on Z score:
Solved Examples
Question 1: The scores of an exam are recorded. Mean and standard deviation of the marks are 263 and 45 respectively. If
scores are distributed normally, what would be the Z score for obtaining 350?
Solution:
Standardized random variable x = 350

Mean x = 263
Population standard deviation = 45
Formula for Z score is given below:
Z score = xx
Z score = 35026345
= 1.933
Question 2: A student wrote 2 quizzes. In first quiz, he scored 80 and in other, he scored 75. The mean and standard
deviation of first quiz are70 and 15 respectively, while the mean and standard deviation of second quiz are 54 and 12
respectively. The results follow normal distribution. What can you conclude about the student's result by seeing their z
scores?
Solution:
Calculation of student's Z score for first quiz:
Mean x = 70
Z score = xx
Z score = 807015
= 0.667
Calculation of student's Z score for second quiz:
Mean x = 54
Z score = xx
Z score = 755412
= 1.75
Since Z score of second quiz is better than that of first quiz, hence it is concluded that he did better in second quiz.
Sample Size Formula

The number of observation in a given sample population is known as Sample size. The sample size plays an important part
in any study which helps us to find the difference between the population from the given sample. Sample size can be smaller
and larger, but the larger sample size gives us the more accurate results and in the lower case it is denoted by 'n' and the
sample size in upper case is denoted by 'N' .
The sample size formula for the infinite population is given as :
The sample size formula for the finite population is given as :
Here,
SS = Sample size.
Z = Given z value
p = Percentage of population
C = Confidence level
Pop = Population
Sample Size Problems

Back to Top
Below are few problems based on Sample size:
Solved Examples
Question 1: Find the Sample size for finite and infinite population, when percentage of 4300 population is 5, confidence
level 95 and confidence interval is 0.04?
Solution:
From the given data:
Z = 3.8416 ( from the z table, we the value of confidence level, that is 1.96)
by applying given data in the formula
SS = Z2p(1p)C2
SS = (1.96)20.5(10.5)0.042 = 600.25
SS=600 (after rounding to nearest whole numbers)
Now lets calculate the sample size for the finite population.
New SS = SS1+(SS1Pop)
New SS = 6001+(60014300)
New SS = 527
Question 2: Find the Sample size for finite and infinite population using the given data below, when percentage of 7800
population is 5, confidence level 90 and confidence interval is 0.04?
Solution:
From the given data:
Z= 2.7060( from the z table, we the value of confidence level, that is 1.645)
by applying given data in the formula
SS = Z2p(1p)C2
SS = (1.645)20.5(10.5)0.042 = 422.812
SS = 423 (after rounding to nearest whole numbers)
Now lets calculate the sample size for the finite population.
New SS = SS1+(SS1Pop)
New SS = 4231+(42317800) = 401.28
New SS = 401 (after rounding to nearest whole numbers)
Confidence Interval Formula

While interpreting various results from a set of data, a researcher needs to know how sure is he while dealing with the data.
Confidence interval is a range within which most plausible values would occur. To calculate confidence interval, one needs to
set confidence level as 90%, 95%, or 99% etc. Most commonly used confidence level is 95%. Confidence
interval represents a particular interval within which the data is 95% (or whatever the confidence level chosen) sure or
certain for a particular outcome. The formula for confidence interval is given below:
and
Where,
n = Number of terms
x = Sample Mean
= Standard Deviation
z2 = Value corresponding to 2 in z table

t2 = Value corresponding to 2 in t table
=1- confidence level100.
Confidence Interval Problems
Back to Top
Few problems based on confidence interval are given below:
Solved Examples
Question 1: Average score obtained by 10 students in a test is 25. Standard deviation of the sample is 8. Find the
confidence interval at 95% confidence level?
Solution:
n = 10
Mean x = 25
Standard deviation = 8
Confidence level = 95% = 0.95
=1 - confidence level100
=1- 95100 = 0.05
Formula for confidence interval:
Confidence interval=xt2(n)
Confidence interval=25t0.052(810)
= 252.09302(810)
=19.28 to 30.72
Question 2: A survey was performed on 50 families asking how many members are there in the family. The mean and
standard deviation of the survey was 6 and 5 respectively. If confidence level is 95%, what would be the confidence interval
of the survey?
Solution:
n = 50
Mean x = 6
Confidence level = 95% = 0.95
=1 - confidence level100
=1 - 95100 = 0.05
Formula for confidence interval:
Confidence interval = xz2(n)

Confidence interval = 6z0.052(510)
= 60.51(510)
= 4.579 to 7.421
Chi Square Formula

The Chi Square test is the most important and most used method in stastictical tests. The purpose of Chi Square test is
know the difference between an observed frequency and expected frequency. This test, sometimes is also used to test the
differences between the two or more observed data.Its value can be calculated by using the given observed frequency and
expected frequency.
The Chi Square is denoted by X2and the formula is given as:
Here,
O = Observed frequency
E = Expected frequency
= Summation
X2 = Chi Square value
Chi Square Test Problems

Back to Top
Below are the problems based on the Chi Square test:
Solved Examples
Question 1: Calculate the chi-square value, if observed frequency is 8 and expected frequency is 15?
Solution:
Given:
Observed frequency = 8
Expected frequency = 15
X2 = (OE)2E
X2 = (815)215
X2 = (7)215
X2 = 4915
X2 = 3.267
Chi Square value = 3.267
Question 2: Calculate the chi-square value for the following data.
Color
Red
Green
Yellow
Observed frequency
12
16
20
Expected frequency
16
25
Solution:
Lets find the Chi Square value for the given data using the formula :
X2 = (OE)2E
First, lets calculate (OE)2 for each color.
Red color = (OE)2 = (1216)2 = 16
Green color = (OE)2 = (168)2 = 64
Yellow color = (OE)2 = (2015)2 =15

Chi-Square value for Red color = (1216)216 = 1
Chi-Square value for Green color = (168)28 = 8
Chi-Square value for Yellow color = (2015)215 = 1
So, the chi-square value for the given data is = 1 + 8 + 1 = 10
Chi Square value = 10
Z Test Formula
Z Test is a concept of statistics which compares means of two populations. Z test assumes normal distribution under null
hypothesis. Z test is performed on a large number of data or on a population data. On the other hand, for a small data or
sample data, T test is performed. The score determined by Z test is called "Z score". Z score can be approximated when
population standard deviation of a large data is given. Z test uses an assumed value which is generally within the limits of
given data to calculate Z score. This value is known as "standardized random variable".
The formula for calculating Z score is given below:
Where,
x = Standardized random variable
x = Mean of the data

= Population standard deviation.
The formula for population standard deviation is given below:
Where,
= Population standard deviation
xi = Numbers given in the data

Z Test Problems
Back to Top
Few problems based on Z test are given below:
Solved Examples
Question 1: In a government organization, the mean basic salary of the employees is INR 5000. How much will be the z
score of employees whose basic salary is INR 3000, if standard deviation of the population is 850?
Solution:
Mean x = 5000
Z score = xx
Z score = 30005000850
= -2.353
Question 2: The marks obtained in mathematics exam by students of a class vary from 33 to 100. If average marks are 62
and standard deviation is 20. Find the Z score for students who scored 80?
Solution:

Mean of the data x = 62
Z score = xx
Z score = 806220
= 0.9
F Test Formula
F Test is a method to compare variance of two different set of values. F test is applied on F distribution under null
hypothesis. For calculating F test value, we first find the mean of two given observations and then calculate their variance.
F test value is expressed as the ratio of variances of two observations. The comparison between the variances of two sets of
data can lead to many predictions. The formula for F test is given below:
Variance is given by the following formula:
Where,
2 = Variance
x = Values given in a set of data

n = Total number of values.
F Test Problems
Back to Top
Few problems based on F test are given below:
Solved Examples
Question 1: Find the F value for the following two observations:
1, 3, 5, 7, 9 and 5 ,9, 3, 8, 3?
Solution:
Formula for mean is given by:
x = xn
Formula for variance is given by:
2 = (xx)2n1
Calculation for first set of values:
x1 = 5
x1
x1x1
(x1x1)2
-4
16
-2
16
(x1x1)2 = 40
12 = 10
Calculation for second set of values:
x2 = 5
x2
x2x2
(x2x)2
0.6
0.36
3.4
11.56
-2.6
6.76
2.4
5.76
-2.6
6.76
(x1x1)2 =31.2
22 = 7.8
F value = 2122
F value = 107.8 = 1.282
Question 2: The following two sets of data are given:

4, 2, 5, 1, 3 and 8, 3, 9, 0, 1.
Calculate the F value.
Solution:
x = xn
2 = (xx)2n1
Calculation for first set of values:
x1 =3
x1
x1x1
(x1x1)2
-1
-2
(x1x1)2 = 10
12 = 2.5
Calculation for second set of values:
x2 = 4.2
x2
x2x2
(x2x)2
3.8
14.44
-1.2
1.44
4.8
23.04
-4.2
17.64
-3.2
10.24
(x1x1)2 =66.8
22 = 16.7
F value = 2122
F value = 2.516.7
= 0.1497 = 0.15 (approx)
T Test Formula
T Test is often called Student's T test in the name of its founder "Student". T test is used to compare two different set of
values. It is generally performed on a small set of data. T test is generally applied to normal distribution which has a small
set of values. This test compares the mean of two samples. T test uses means and standard deviations of two samples to
make a comparison. The formula for T test is given below:
Where,
x1 = Mean of first set of values

x2 = Mean of second set of values
S1 = Standard deviation of first set of values
S2 = Standard deviation of second set of values
n1 = Total number of values in first set
n2 = Total number of values in second set.
The formula for standard deviation is given by:
Where,
x = Values given
x = Mean
T Test Problems
Back to Top
Few problems based on T test are given below:
Solved Examples
Question 1: Find the t-test value for the following two sets of values:
7, 2, 9, 8 and 1, 2, 3, 4?
Solution:
Formula for mean:
x = xn
Formula for standard deviation:
S=(xx)2n1
Calculation for first set:
Number of terms in first set:
n1 = 4
Mean for first set of data:
x1 = 6.5
Construct the following table for standard deviation:
x1
x1x1
(x1x1)2
0.5
0.25
-4.5
20.25
2.5
6.25
1.5
2.25
(x1x1)2 = 29
Standard deviation for first set of data:

S1 = 3.11
Calculation for second set:
Number of terms in second set:
n2 = 4
Mean for second set of data:
x2 = 2.5
x2
x2x2
(x2x2)2
-1.5
2.25
-0.5
0.25
0.5
0.25
1.5
2.25
(x2x2)2 = 5

S2 = 1.29
Formula for t-test value:
t = x1x2S21n1+S22n2
t = 6.52.59.6674+1.6674
t = 2.3764 = 2.38 (approx)
Question 2: Find the t-test value for the following two sets of data:
x1
10
11
12
x2
Solution:
Formula for mean:
x = xn
Formula for standard deviation:
S = (xx)2n1
Calculation for first set:
Number of terms in first set:
n1 = 4
Mean for first set of data:
x1 = 10.5
x1
x1x1
(x1x1)2
-1.5
2.25
10
-0.5
0.25
11
0.5
0.25
12
1.5
2.25
(x1x1)2 = 5

S1 = 1.291
Calculation for second set:
Number of terms in second set:
n2 = 4
Mean for second set of data:
x2 = 5
x2
x2x2
(x2x2)2
-3
-1
9
(x2x2)2 = 20

S2 = 2.582
Formula for t-test value:
t = x1x2S21n1+S22n2
t = 10.551.6674+6.6674
t = 3.8105 = 3.81 (approx)
Linear Regression Formula

Regression is a concept of statistics which is used to find the relationship between two variables or two sets of data. It is
used to compare the sets of data. Linear regression is the process of finding the linear equation which best fits with the two
sets of data points. One variable is dependent variable, while other is the independent variable. Independent variable is also
called explanatory variable. Formula for linear regression equation is given by:
a and b are given by the following formulas:
where,
X and Y are two variables on regression line.
b = Slope of the line.

a = y-intercept of the line.
x = Values of first data set.
y = Values of second data set.
Linear Regression Problems

Back to Top
Few problems based on linear regression are given below:
Solved Examples
Question 1: Find linear regression equation for the following two sets of data:
10
Solution:
Construct the following table:
x2
xy
16
28
36
30
10
64
80
x = 20
y = 25
x2 = 120
xy = 144
b = nxy(x)(y)nx2(x)2
b = 414420254120400
b = 0.95
a = yb(x)n
a = 260.95204
a = 1.5
Linear regression is given by:
y = a + bx
y = 1.5 + 0.95x
Question 2: Find the linear regression of the following sets of data:
Solution:
x2
xy
16
16
12
x = 10
y = 20
x2 = 30
xy = 43
b = nxy(x)(y)nx2(x)2
b = 4431020430100
b = -1.4
a = yb(x)n
a = 20+1.4104
a = 8.5
Linear regression is given by:
y = a + bx
y = 8.5 - 1.4x
FormulasMath FormulasStatistics FormulasRelative Frequency Formula
Top
Relative Frequency Formula

In statistics, sometimes few numbers in the given data repeat a number of times. This repetition of a particular number is
called its frequency. The frequency of an item represents how many times that item is appearing in a data. On the other
hand, relative frequency of a number is its frequency as compared to the total frequencies of all the numbers. Relative
frequency is evaluated by dividing the individual frequency of an item by total number of frequencies. The formula for the
relative frequency is given below
Where,
f = Frequency of an individual item
n = Total frequencies.
Relative Frequency Problems

Back to Top
Few problems based on relative frequency are as follows:
Solved Examples
Question 1: Construct the relative frequency table for the following data:
2, 2, 7, 4, 5, 1, 5, 7, 9, 0, 1, 3, 2, 5, 9, 6, 2, 3, 7, 0
Solution:
Formula for relative frequency:

Relative frequency = fn
The table of relative frequency is given below:
220 = 0.1
220 = 0.1
420 = 0.2
220 = 0.2
120 = 0.05
320 = 0.15
120 = 0.1
320 = 0.15
220 = 0.1
n = 20
Question 2: Find the relative frequency of the following data:
Class Interval
Frequency
0-10
10-20
20-30
10
30-40
15
40-50
Solution:
Formula for relative frequency:

The table of relative frequency is given below:
Class Interval
Frequency
0-10
545 = 0.111
10-20
845 = 0.178
20-30
10
1045 = 0.222
30-40
15
1545 = 0.333
40-50
745 = 0.156
n = 45
Margin of Error Formula

The margin of error is the statistical concept which expresses the certainty and uncertainty between the true and an
estimated parameter.
The accuracy between the true and estimated parameter are considered on the size of the margin of error. If the margin of
error is small, it indicates that the results are trustworthy and if the margin of error is larger, its says that the results are far
from the accuracy.
In simple words, margin of error is the product of critical value and the standard deviation.Critical value is expressed
in Z score.
The margin of error is denoted by E and the formula is given as,
Here,
$z$ $(\frac{\alpha
}{2})$ = represents the critical value.

$z$ $(\frac{\sigma }{\sqrt{n}})$ = represents the standard deviation.
Margin of Error Problems
Back to Top
Below are few problems based on Margin of Error:
Solved Examples
Question 1: A random sample of 30 students has an average yearly earnings of 2450 and a standard deviation of 587. Find
the margin of error if c = 0.95?
Solution:
Given n=30
Standard Deviation= 587
z $\frac{\alpha}{2}$ = 1.96
$\therefore$ by using the formula E= z $(\frac{\alpha
= 1.96
$(\frac{587}{\sqrt{30}})$
= 210.06
= 210 approximately
The margin of error = 210
}{2})(\frac{\sigma }{\sqrt{n}})$
Question 2: An average monthly earning of 56 employees in a company is 4685 and a standard deviation is 354. Find the
margin of error if c= 0.95?
Solution:
Given n = 56
Standard Deviation= 354
z $\frac{\alpha}{2}$ = 1.96
$\therefore$ by using the formula E = z $(\frac{\alpha
= 1.96 $(\frac{354}{\sqrt{56}})$
= 92.72
= 93 approximately
The margin of error = 93
}{2})(\frac{\sigma }{\sqrt{n}})$
ANOVA
Anova is a statistical test which analyzes variance. It is helpful in making comparison of two or more means which enables a
researcher to draw various results and predictions about two or more sets of data. Anova test includes one-way anova, twoway anova or multiple anova depending upon the type and arrangement of the data. One-way anova has the following test
statistics:
Where,
F = Anova Coefficient
MST = Mean sum of squares due to treatment
MSE = Mean sum of squares due to error.
Formula for MST is given below:
Where,
SST = Sum of squares due to treatment
p = Total number of populations
n = Total number of samples in a population.
Formula for MSE is given below:
Where,
SSE = Sum of squares due to error
S = Standard deviation of the samples
N = Total number of observations.
Few problems based on Anova formula are given below:
Solved Examples
Question 1: Following data is given about cricket teams of three countries:
Countries
Number of Players
Average Runs
Standard Deviations
India
11
60
15
New Zealand
11
50
10
South Africa
11
70
12
Find Anova coefficient?

Solution:
Cricket Teams
S2
India
11
60
15
225
New Zealand
11
50
10
100
South Africa
11
70
12
144
n = 11
p=3
N = 33
x = 60+50+703 = 60
SST=n(xx)2
SST=11(6060)2+11(5060)2+11(7060)2
= 2200
MST = SSTp1
MST = 220031
= 1100
SSE=(n1)S2
SSE = 10*225 + 10*100 + 10*144
= 4690
MSE = SSENp
MSE = 4690333
MSE = 156.33
F = MSTMSE
F = 1100156.33
= 7.036
Question 2: The following data is given:
Plant Name
Number of plants
Average Flowers
Standard Deviation
Rose
12
Marigold
16
Lily
20
Calculate the Anova coefficient.

Solution:
Plant name
S2
Rose
12
Marigold
16
Lily
20
16
p=3
n=5
N = 15
x = 16
SST = n(xx)2
SST = 5(1216)2+5(1616)2+11(2016)2
= 160
MST = SSTp1
MST = 16031
= 80
SSE = (n1)S2
SSE = 4*4 + 4*1 + 4*16
= 84
MSE = SSENp
MSE = 84153
MSE = 7
F = MSTMSE
F = 807
= 11.429
Pearson Correlation Formula

Correlation is the relationship between two variables. Correlation coefficient is the measurement of
correlation. It indicates how well the two set of data are interconnected. Pearson correlation coefficient
measures the linear dependence of two variables upon each other. It is also referred as Pearson productmoment correlation coefficient. The value of Pearson correlation coefficient lies between -1 to +1. If the
coefficient of correlation is zero, then there is no correlation between given two variables. On the other
hand, the perfectly positive correlation has a value of +1, while a perfectly negative correlation has a
value of -1.
The graphical representation of positive, negative and no correlation is shown below:
Pearson correlation coefficient for sample data is denoted by "r". The formula for Pearson correlation
coefficient r is given by:
Where,
r = Pearson correlation coefficient
x = Values in first set of data
y = Values in second set of data
Pearson Correlation Problems

Back to Top
The few problems based on Pearson correlation coefficient are listed below:
Solved Examples
Question 1: Marks obtained by 5 students in algebra and trigonometry as given below:
Algebra
15
16
12
10
Trigonometry
18
11
10
20
17
Calculate the Pearson correlation coefficient.

Solution:
x2
y2
xy
15
18
225
324
270
16
11
256
121
176
12
10
144
100
120
10
20
100
400
200
17
64
289
136
x = 61
y = 76
x2 = 789
y2 = 1234
xy = 902
Formula for Pearson correlation coefficient is given by:

r = n(xy)(x)(y)[nx2(x)2][ny2(y)2]
r = 59026176[5789(61)2][51234(76)2]
r = -0.424
Question 2: Following the values for x and y:
Evaluate Pearson correlation coefficient.

Solution:
x2
y2
xy
25
49
35
36
18
16
81
36
64
16
x = 17
y = 27
x2 = 81
y2 = 203
xy = 105
Formula for Pearson correlation coefficient is given by:

r = n(xy)(x)(y)[nx2(x)2][ny2(y)2]
r = 41051727[481(17)2][4203(76)2]
r = -0.724
Coefficient of Determination Formula

Coefficient of Determination is one of the most important tools in statistics which is widely used in data analysis in
economics, physics, chemistry and many more fields. Coefficient of determination allows us to forecast or predict the
possible outcomes and possible variability in the data. Coefficient of determination is denoted by r 2 or sometimes by R2. It is
simply explained as the square of r which is correlation coefficient. The value of coefficient of determination lies between 0
and 1. The higher the value of r2, the better the prediction becomes. The formula for coefficient of determination is given
below:
The formula of correlation coefficient is given below:
Where,
r = Correlation coefficient
x = Values in first set of data
y = Values in second set of data
Coefficient of Determination Problems
Back to Top
The few problems based on coefficient of determination are as follows:
Solved Examples
Question 1: Marks obtained by few students in physics and chemistry tests are given by the following table:
Physics
18
16
15
10
Chemistry
15
12
17
Compute the coefficient of determination.

Solution:
Construct the following table for the determination of correlation coefficient:
x2
y2
xy
18
15
324
225
270
16
12
256
144
192
15
225
81
135
10
17
100
289
170
x = 59
y = 53
x2 = 905
y2 = 739
xy = 767
Formula for correlation coefficient:
r = n(xy)(x)(y)[nx2(x)2][ny2(y)2]
r = 47675953[4905(59)2][4739(53)2]
r = 0.41275
Coefficient of determination = r2
= 0.17036 = 0.17 (approx).
Question 2: During an observation at a garden, the number of roses and the number of marigold flowers are noted every
week. The readings of 4 successive weeks are as follows:
Roses
Marigold
Evaluate the coefficient of determination.

Solution:
3
5
Construct the following table for the determination of correlation coefficient:
x2
y2
xy
25
15
16
36
24
x = 10
y = 16
x2 = 30
y2 = 74
xy = 46
Formula for correlation coefficient:
r = n(xy)(x)(y)[nx2(x)2][ny2(y)2]
r = 4461016[430(10)2][474(16)2]
r = 0.8485
Coefficient of determination = r2
= 0.7199 = 0.72 (approx).
Variance Formula
Variance is one of the most important concepts of statistics. It is the measure of dispersion of a set of data. It indicates how
far the different values of a set of data are spread over. Variance is the measurement of deflection of values from its mean.
Variance is the average of squared differences of each value from mean of the data.
There is one difference between variance and standard deviation that standard deviation is defined as the square root of
variance. Variance is denoted by square of a Greek letter sigma ($\sigma ^{2}$). The formulas for variance are given below.
Variance formula for population data is as follows:
Variance formula for sample data is as follows:
Where,
$\sigma ^{2}$ = Variance
x = Item given in the data
$\bar{x}$ = Mean of the data
Variance Problems
Back to Top
Few problems based on variance formula are given below:
Solved Examples
Question 1: Marks obtained by few students are: 75, 83, 54, 90, 61. Find the variance of the sample?
Solution:
Formula for mean:
$\bar{x}$ = $\frac{\sum
x}{n}$
$\bar{x}$ = $\frac{363}{5}$ = 72.6
$x-\bar{x}$
$(x-\bar{x})^{2}$
75
2.4
5.76
83
10.4
108.16
54
-18.6
345.96
90
17.4
302.76
61
-11.6
134.56
$\sum x$ = 363
$\sum (x-\bar{x})^{2}$ = 897.2

$\sigma ^{2}$ = $\frac{\sum
(x-\bar{x})^{2}}{n-1}$
$\sigma ^{2}$ = $\frac{897.2}{4}$

= 224.3
Question 2: During a survey, the number of members in the family were asked. Few answers of them are 5, 6, 2, 3, 1, 7, 4,
8. Calculate the variance.
Solution:
Formula for mean:

$\bar{x}=\frac{\sum x}{n}$
$\bar{x}$ = $\frac{36}{8}$ = 4.5
$x-\bar{x}$
$(x-\bar{x})^{2}$
0.5
0.25
1.5
2.25
-2.5
6.25
-1.5
2.25
-3.5
12.25
2.5
6.25
-0.5
0.25
3.5
12.25
$\sum x$ =36
$\sum (x-\bar{x})^{2}$ = 42
Formula for sample variance is given by:

$\sigma ^{2}$ = $\frac{\sum
(x-\bar{x})^{2}}{n-1}$
$\sigma ^{2}$ = $\frac{42}{7}$
=6
Correlation Coefficient Formula

The correlation co-efficient, is the measure of the strength of two random variables.The correlation coefficient values ranges
between +1 to -1. When the correlation coefficient value comes out be positive, it represents the similar and identical relation
between the two values and the negative value represents the dissimilarity between the two values.
In other words,it is the way to find out how well the predicted ideas are matching with the real life data.
The correlation co-efficient is denoted by r.
If x and y are the two variables, then the formula for correlation coefficient is given as:
Here,
n= Number of data.
x= Sum of first data list.
y= Sum of second data list.
xy =Sum of the product of 1st and 2nd value.
x2 = Sum of the squares of the 1st value.
y2 = Sum of the squares of the 2nd value.
Correlation Coefficient Problems

Back to Top
Below are the few example problems based on correlation coefficient:
Solved Examples
Question 1: Calculate the Correlation Coefficient for the below table:
x values
y values
60
3.1
61
3.6
62
3.8
63
Solution:
Given n= 4
Firstly, lets calculate xy, x2 and y2
x values
y values
xy
y2
3600
9.61
60
3.1
61
3.6
219.6
3721
12.96
62
3.8
235.6
3844
14.44
63
252
3969
16
Now, lets find the summation of all these values

So,x = 60 + 61 + 62 + 63 = 246
y = 3.1 + 3.6 + 3.8 + 4 = 14.5
xy = 186 + 219.6 + 235.6 + 252 = 893.2
186
x2
x2= 3600 + 3721 + 3844 + 3969 = 15134

y2= 9.61 + 12.96 + 14.44 + 16 = 53.01
r= n(xy)(x)(y)[nx2(x)2][ny2(y)2]
= 4(893.2)(246)(14.5)[4(15134)(246)2][4(53.01)(14.5)2]
Correlation Coefficient (r)= 0.96936
Question 2: Suppose there are two test scores:
x value
y value
100
28
106
33
112
26
98
27
87
24
77
24
67
21
66
26
49
22
Find the correlation coefficient?

Solution:
Given n = 9
Firstly, lets calculate xy, x2 and y2
x value
y value
xy
x2
y2
100
28
2800
10000
784
106
33
3498
11236
1089
112
26
2912
12455
676
98
27
2646
9604
729
87
24
2088
7569
576
77
24
1848
5929
576
67
21
1407
4489
441
66
26
1416
4356
676
49
22
1078
2401
484
Now, lets find the summation of all these values

So,x = 100 + 106 + 112 + 98 + 87 + 77 + 67 + 66 + 49 = 762
y = 28 + 33 + 26 + 27 + 24 + 24 + 21 + 26 + 22 = 231
xy = 2800 + 3498 + 2912 + 2646 + 2088 + 1848 + 1407 + 1416 + 1078 = 19693
x2= 10000 + 11236 + 12455 + 9604 + 7569 + 5929 + 4489 + 4356 + 2401 = 68039
y2= 784 + 1089 + 676 + 729 + 576 + 576 + 441 + 676 + 484= 6031
r= n(xy)(x)(y)[nx2(x)2][ny2(y)2]
= 9(19693)(762)(231)[9(68039)(762)2][9(6031)(231)2]
Correlation Coefficient (r) = 0.716664
Quartile Formula
Just like the median divides the set of observation into two equal parts when arranged in the numerical order, in the same
way quartile divides the set of observation into 4 equal parts.
The value of the middle term, between the first term and median is known as first or Lower Quartile and is denoted as Q1.
The value of middle term between the last term and the median is known as third or Upper Quartile and is denoted
as Q3.The median itself is known as the Second Quartile and is denoted as Q2
When the set of observation is arranged in an ascending order, then the Lower quartile is given as:
If the solution is a decimal number then, Lower quartile Q1 is given by rounding to the nearest whole interger.
The Second quartile, which is the median of the set of observation is given as:
The Upper quartile is given as,
If the solution is a decimal number then, Upper quartile Q3 is given by rounding to the nearest whole interger.
The lower and the upper quartile value helps us to find the measure of dispersion in the set of observation, which is called
as 'inter-quartile range', it is denoted as IQR and it is the difference between upper and lower quartile.
Quartile Problems
Back to Top
Below are the problems based on Quartile:
Solved Examples
Question 1: Find the median, lower quartile, upper quartile and inter-quartile range of the following data set of scores: 19,
22, 24, 20, 24, 27, 25, 24, 30?
Solution:
First, lets arrange of the values in an ascending order:
19, 20, 22, 24, 24, 24, 25, 27, 30
Now lets calculate the Median,
Median = $\left(\frac{n + 1}{2} \right)^{th}$ term
= $\left(\frac{9 + 1}{2} \right)^{th}$ term
= $5^{th}term$
= 24
Lower quartile = $\left(\frac{n + 1}{4} \right)^{th}$ term
= $\left(\frac{10}{4} \right)^{th}$ term
= $2.5^{th}$
Find the average of 2nd and 3rd term
= $\frac{20
+ 22}{2}$
= $\frac{42}{2}$
= 21
Upper quartile = $\left(\frac{3(n + 1)}{4} \right)^{th}$
= $\left(\frac{3(9 + 1)}{4} \right)^{th}$
= $\left(\frac{3(10)}{4} \right)^{th}$
= $\left(\frac{30}{4} \right)^{th}$
= $7.5^{th}$
(lets find the average of 7th and 8th term)
= $\frac{25
+ 27}{2}$
= $\frac{52}{2}$
= 26
Inter - quartile= Upper quartile - lower quartile
= 26 - 21
=5
Question 2: Find the first quartile, second quartile and third quartile of the given information of the following sequence 4, 77,
16, 59, 93, 88?
Solution:
First, lets arrange of the values in an ascending order:
4, 16, 59, 77, 88, 93
Given n = 6
$\therefore$ Lower quartile = $\left(\frac{n + 1}{4} \right)^{th}$ term
= 1.7th term
Here we can consider the 2nd term (rounding 1.7 to nearest whole integer) from the set of observation.
$\Rightarrow$ 2nd term = 16
Lower quartile = 16
Upper quartile = $\left(\frac{3(n + 1)}{4} \right)^{th}$ term
= $\left(\frac{3(6 + 1)}{4} \right)^{th}$ term
= $5.25^{th}$
Here we can consider the 5thterm (rounding 5.25 to nearest whole integer) from the set of observation.
$\Rightarrow$ $5.25^{th}$ = 88
Upper quartile= 88
Inter-quartile= Upper quartile - lower quartile
= 88 - 16
= 72
tandard Deviation Formula

Standard Deviation represents the deviation of the values of a set of data from its average or mean. It shows how the
different values of a particular data set are dispersed. When standard deviation is lower, it means that the values are very
close to their average. On the other hand, when standard deviation is higher, it means that the values are scattered far from
the average value. Standard deviation of a data set is square root of its variance. The value of standard deviation can never
be negative or less than zero. There are two types of standard deviations: population standard deviation and sample
standard deviation. The formula for population standard deviation is given below:
Sometimes, only a sample of whole population is given. In this case instead of calculating population standard deviation, we
calculate sample standard deviation. The formula for sample standard deviation is given below:
Where,
xi = Terms given in the data
x = Mean
n = Total number of terms.
Standard Deviation Problems

Back to Top
Few problems based on standard deviation are given below:
Solved Examples
Question 1: During a survey, 6 students were asked that how many hours per day they study on an average? Their answers
were as follows: 2, 6, 5, 3, 4, 1. Evaluate the standard deviation.
Solution:
x = xin
x = 2+6+5+3+4+16
= 3.5
xi
xix
(xix)2
-1.5
2.25
2.5
6.25
1.5
2.25
-0.5
0.25
0.5
0.25
-2.5
6.25
(xix)2 = 17.5
Formula for standard deviation is given by:
S = ni=1(xix)2n
S=17.56
S = 2.92 = 1.71
Question 2: Marks obtained by 4 students in a class are 25, 15, 20, 18. Find the standard deviation of the sample?
Solution:
x = ni=1xin
x = 25+15+20+184
= 19.5
xi
xix
(xix)2
25
5.5
30.25
15
-4.5
20.25
20
0.5
0.25
18
-1.5
2.25
(xix)2 = 53
Formula for standard deviation is given by:
S = ni=1(xix)2n1
S=533
S = 4.2

Statistical Tools and Formula

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Statistical Tools and Formula

Hochgeladen von

Copyright:

Verfügbare Formate

Tests available in MedCalc

MedCalc offers the following tests for Normal distribution:

SHAPIRO-WILK ORIGINAL TEST

Rearrange the data in ascending order so that x x .

If n is even, let m = n/2, while if n is odd let m = (n1)/2

Calculate the test statistic W = b SS

is closest to W, interpolating if necessary. This is the p-value for the test.

Figure 1 Shapiro-Wilk test for Example 1

Figure 2 Shapiro-Wilk test for Example 2

with mmin = min(q,p).

Following are the few problems based on Z score:

Standardized random variable x = 350

Sample Size Formula

The sample size formula for the finite population is given as :

Sample Size Problems

Below are few problems based on Sample size:

Confidence Interval Formula

z2 = Value corresponding to 2 in z table

Confidence Interval Problems

Few problems based on confidence interval are given below:

Confidence interval = xz2(n)

Chi Square Formula

Chi Square Test Problems

Below are the problems based on the Chi Square test:

Question 2: Calculate the chi-square value for the following data.

Yellow color = (OE)2 = (2015)2 =15

x = Mean of the data

x = Mean of the data

Few problems based on Z test are given below:

Standardized random variable x = 80

Variance is given by the following formula:

x = Mean of the data

Few problems based on F test are given below:

Question 2: The following two sets of data are given:

x1 = Mean of first set of values

Few problems based on T test are given below:

Standard deviation for first set of data:

Standard deviation for first set of data:

Formula for t-test value:

Standard deviation for first set of data:

Standard deviation for first set of data:

Linear Regression Formula

b = Slope of the line.

Linear Regression Problems

Few problems based on linear regression are given below:

Question 2: Find the linear regression of the following sets of data:

FormulasMath FormulasStatistics FormulasRelative Frequency Formula

Relative Frequency Formula

Relative Frequency Problems

Few problems based on relative frequency are as follows:

Formula for relative frequency:

Question 2: Find the relative frequency of the following data:

Formula for relative frequency:

Margin of Error Formula

}{2})$ = represents the critical value.

Below are few problems based on Margin of Error:

Few problems based on Anova formula are given below:

Find Anova coefficient?

Question 2: The following data is given:

Calculate the Anova coefficient.

Pearson Correlation Formula

Pearson Correlation Problems

Calculate the Pearson correlation coefficient.

Formula for Pearson correlation coefficient is given by: