Sie sind auf Seite 1von 50

Tests available in MedCalc

MedCalc offers the following tests for Normal distribution:

The Shapiro-Wilk test (Shapiro & Wilk, 1965; Royston, 1995) and the Shapiro-Francia
test (Shapiro & Francia, 1972; Royston, 1993a) calculate a W and W' statistic, respectively,
that tests whether a random sample comes from a Normal distribution. Small values of W or
W' are evidence of departure from normality. The Shapiro-Wilk W statistic can only be
computed when sample size is between 3 and 5000 (inclusive) (Royston, 1995), the ShapiroFrancia W' statistic can be computed when sample size ranges from 5 to 5000 (Royston,
1993a & 1993b).

The D'Agostino-Pearson test (Sheskin, 2011) computes a single P-value for the combination
of the coefficients of Skewness and Kurtosis.

The Kolmogorov-Smirnov test (Neter et al., 1988) with Lilliefors significance correction
(Dallal & Wilkinson, 1986) is based on the greatest discrepancy between the sample
cumulative distribution and the Normal cumulative distribution.

The Chi-squared goodness-of-fit test is applied to binned data (the data are put into
classes) (Snedecor & Cochran, 1989) and requires a larger sample size than the other two
tests.

SHAPIRO-WILK ORIGINAL TEST


We present the original approach to the performing the Shapiro-Wilk Test. This approach is
limited to samples between 3 and 50 elements. By clicking here you can also review a
revised approach using the algorithm of J. P. Royston which can handle samples with up to
5,000 (or even more).
The basic approach used in the Shapiro-Wilk (SW) test for normality is as follows:

Rearrange the data in ascending order so that x x .

Calculate SS as follows:
1

If n is even, let m = n/2, while if n is odd let m = (n1)/2


Calculate b as follows, taking the a weights from the Table 1 (based on the value of n)
i

in the Shapiro-Wilk Tables. Note that if n is odd, the median data value is not used in the
calculation of b.

Calculate the test statistic W = b SS


2

Find the value in the Table 2 of the Shapiro-Wilk Tables (for a given value of n) that

is closest to W, interpolating if necessary. This is the p-value for the test.


For example, suppose W = .975 and n = 10. Based on Table 2 of the Shapiro-Wilk Tables the
p-value for the test is somewhere between .90 (W = .972) and .95 (W = .978).
Example 1: A random sample of 12 people is taken from a large population. The ages of the
people in the sample are given in column A of the worksheet in Figure 1. Is this data
normally distributed?

Figure 1 Shapiro-Wilk test for Example 1

We begin by sorting the data in column A using Data > Sort & Filter|Sort or
theQSORT supplemental function, putting the results in column B. We next look up the
coefficient values for n = 12 (the sample size) in Table 1 of the Shapiro-Wilk Tables, putting
these values in column E.
Corresponding to each of these 6 coefficients a ,,a , we calculate the values x x , ,x x ,
where x is the ith data element in sorted order. E.g. since x = 35 and x = 86, we place the
difference 86 35 = 51 in cell H5 (the same row as the cell containing a ). Column I contains
the product of the coefficients and difference values. E.g. cell I5 contains the formula
=E5*H5. The sum of these values is b = 44.1641, which is found in cell I11 (and again in cell
E14).
We
next
calculate SS as
DEVSQ(B4:B15)
=
2008.667.
Thus W = b SS =
44.1641^2/2008.667 = .971026. We now look for .971026 when n = 12 in Table 2 of
the Shapiro-Wilk Tables and find that the p-value lies between .50 and .90. The W value
for .5 is .943 and the W value for .9 is .973.
1

12

12

Interpolating .971026 between these value (using linear interpolation), we arrive at p-value
= .873681. Since p-value = .87 > .05 = , we retain the null hypothesis that the data are
normally distributed.
Example 2: Using the SW test, determine whether the data in Example 1 of Graphical Tests
for Normality and Symmetry are normally distributed.

Figure 2 Shapiro-Wilk test for Example 2

As we can see from the analysis in Figure 2, p-value = .0419 < .05 = , and so we reject the
null hypothesis and conclude with 95% confidence that that the data are not normally
distributed, which is quite different from the results using the KS test that we found in
Example 2 of Kolmogorov-Smironov Test.
Real Statistics Function: The Real Statistics Resource Pack contains the following
supplemental functions where R1 consists only of numeric data without headings:
SHAPIRO(R1, FALSE) = the Shapiro-Wilk test statistic W for the data in the range R1
SWTEST(R1, FALSE, h) = p-value of the Shapiro-Wilk test on the data in R1
SWCoeff(n, j, FALSE) = the jth coefficient for samples of size n
SWCoeff(R1, C1, FALSE) = the coefficient corresponding to cell C1 within sorted range R1
SWPROB(n, W, FALSE, h) = p-value of the Shapiro-Wilk test for a sample of size nfor test
statistic W
The functions SHAPIRO and SWTEST ignore all empty and non-numeric cells. The range
R1 in SWCoeff(R1, C1, FALSE) should not contain any empty or non-numeric cells.

When performing the table lookup, the default is to use harmonic interpolation (h = TRUE).
To use linear interpolation, set h to FALSE. See Interpolation for details.
For example, for Example 1 of Chi-square Test for Normality, we have SHAPIRO(A4:A15,
FALSE) = .874 and SWTEST(A4:A15, FALSE, FALSE) = SWPROB(15,.874,FALSE,FALSE) =
.0419 (referring to the worksheet in Figure 2 of Chi-square Test for Normality).
It is important to note that SHAPIRO(R1, TRUE), SWTEST(R1, TRUE), SWCoeff(n, j,
TRUE), SWCoeff(R1, C1, TRUE) and SWPROB(n, W, TRUE) refer to the results using the
Royston algorithm, as described in Shapiro-Wilk Expanded Test.
For compatibility with the Royston version of SWCoeff, when j n/2 then SWCoeff(n, j,
False) = the negative of the value of the jth coefficient for samples of size n found in
the Shapiro-Wilk Tables. When j = (n+1)/2, SWCoeff(n, j, FALSE) = 0 and when j >
(n+1)/2, SWCoeff(n, j, FALSE) = -SWCoeff(n, nj+1, FALSE).

VASSARSTAT
STATA
MEDCALC

CONTINGENCY COEFFICIENT
If we look at the contingency table of two uncorrelated nominal variables, we can calculate
the frequency of a particular combination of features hij as
hik = hihk/N
In the case of a correlation of the two variables the actual frequencies Hik will deviate from the ideal
uncorrelated frequencies hik. The difference Dik between ideal (uncorrelated) und actual frequencies
thus calculates as
Dik = Hik - hik = Hik - hihk/N.
For uncorrelated variables the difference of frequencies will be around zero for each cell of the table.
Thus the correlation of the two variables can be measured by squaring the relative differences and
calculating the sum of these squares in relation to the ideal frequencies:

The resulting 2 coefficient, however, has the disadvantage that its value depends both on the
dimension of the contingency table and on the size of the sample. After eliminating the dependence
on the sample size, we get Pearson's contingency coefficient C:

As this coefficient C is still depending on the dimension of the contingency table, it will be normalized
so that its range extends from 0.0 to 1.0:

with mmin = min(q,p).


Hint:

In contrast to the correlation coefficient the corrected contingency coefficient Ccorr does
not indicate the direction of the correlation but only its strength.

Z Score Formula
Z score is the outcome obtained when Z test is performed. Z test follows normal distribution under null hypothesis. Z score is
calculated for a large number of data. To estimate Z score, we require a variable which is known as standardized random
variable. This variable is denoted by x. We can find Z score when mean and standard deviation are known. We subtract the
value of mean from standardized random variable and then divide the result by the value of standard deviation. The formula
for calculating Z score is given below:

Where,
x = Standardized random variable

x = Mean
= Population standard deviation.
Following is the formula for population standard deviation:

Where,
= Population standard deviation
xi = Items given

x = Mean
n = Total number of items.

Z Score Problems
Back to Top

Following are the few problems based on Z score:

Solved Examples
Question 1: The scores of an exam are recorded. Mean and standard deviation of the marks are 263 and 45 respectively. If
scores are distributed normally, what would be the Z score for obtaining 350?
Solution:

Standardized random variable x = 350


Mean x = 263
Population standard deviation = 45
Formula for Z score is given below:

Z score = xx
Z score = 35026345
= 1.933

Question 2: A student wrote 2 quizzes. In first quiz, he scored 80 and in other, he scored 75. The mean and standard
deviation of first quiz are70 and 15 respectively, while the mean and standard deviation of second quiz are 54 and 12
respectively. The results follow normal distribution. What can you conclude about the student's result by seeing their z
scores?
Solution:
Calculation of student's Z score for first quiz:
Standardized random variable x = 80
Mean x = 70
Population standard deviation = 15
Formula for Z score is given below:

Z score = xx
Z score = 807015
= 0.667
Calculation of student's Z score for second quiz:
Standardized random variable x = 75
Mean x = 54
Population standard deviation = 12
Formula for Z score is given below:

Z score = xx
Z score = 755412
= 1.75
Since Z score of second quiz is better than that of first quiz, hence it is concluded that he did better in second quiz.

Sample Size Formula


The number of observation in a given sample population is known as Sample size. The sample size plays an important part
in any study which helps us to find the difference between the population from the given sample. Sample size can be smaller
and larger, but the larger sample size gives us the more accurate results and in the lower case it is denoted by 'n' and the
sample size in upper case is denoted by 'N' .
The sample size formula for the infinite population is given as :

The sample size formula for the finite population is given as :

Here,
SS = Sample size.
Z = Given z value
p = Percentage of population
C = Confidence level
Pop = Population

Sample Size Problems


Back to Top

Below are few problems based on Sample size:

Solved Examples
Question 1: Find the Sample size for finite and infinite population, when percentage of 4300 population is 5, confidence
level 95 and confidence interval is 0.04?
Solution:
From the given data:
Z = 3.8416 ( from the z table, we the value of confidence level, that is 1.96)
by applying given data in the formula

SS = Z2p(1p)C2
SS = (1.96)20.5(10.5)0.042 = 600.25
SS=600 (after rounding to nearest whole numbers)
Now lets calculate the sample size for the finite population.

New SS = SS1+(SS1Pop)
New SS = 6001+(60014300)
New SS = 527

Question 2: Find the Sample size for finite and infinite population using the given data below, when percentage of 7800
population is 5, confidence level 90 and confidence interval is 0.04?
Solution:
From the given data:
Z= 2.7060( from the z table, we the value of confidence level, that is 1.645)
by applying given data in the formula

SS = Z2p(1p)C2
SS = (1.645)20.5(10.5)0.042 = 422.812
SS = 423 (after rounding to nearest whole numbers)
Now lets calculate the sample size for the finite population.

New SS = SS1+(SS1Pop)
New SS = 4231+(42317800) = 401.28
New SS = 401 (after rounding to nearest whole numbers)

Confidence Interval Formula


While interpreting various results from a set of data, a researcher needs to know how sure is he while dealing with the data.
Confidence interval is a range within which most plausible values would occur. To calculate confidence interval, one needs to
set confidence level as 90%, 95%, or 99% etc. Most commonly used confidence level is 95%. Confidence
interval represents a particular interval within which the data is 95% (or whatever the confidence level chosen) sure or
certain for a particular outcome. The formula for confidence interval is given below:

and

Where,
n = Number of terms
x = Sample Mean
= Standard Deviation

z2 = Value corresponding to 2 in z table


t2 = Value corresponding to 2 in t table
=1- confidence level100.

Confidence Interval Problems

Back to Top

Few problems based on confidence interval are given below:

Solved Examples
Question 1: Average score obtained by 10 students in a test is 25. Standard deviation of the sample is 8. Find the
confidence interval at 95% confidence level?
Solution:
n = 10
Mean x = 25
Standard deviation = 8
Confidence level = 95% = 0.95

=1 - confidence level100
=1- 95100 = 0.05
Formula for confidence interval:

Confidence interval=xt2(n)
Confidence interval=25t0.052(810)
= 252.09302(810)
=19.28 to 30.72

Question 2: A survey was performed on 50 families asking how many members are there in the family. The mean and
standard deviation of the survey was 6 and 5 respectively. If confidence level is 95%, what would be the confidence interval
of the survey?
Solution:
n = 50
Mean x = 6
Standard deviation = 5
Confidence level = 95% = 0.95

=1 - confidence level100
=1 - 95100 = 0.05
Formula for confidence interval:

Confidence interval = xz2(n)


Confidence interval = 6z0.052(510)

= 60.51(510)
= 4.579 to 7.421

Chi Square Formula


The Chi Square test is the most important and most used method in stastictical tests. The purpose of Chi Square test is
know the difference between an observed frequency and expected frequency. This test, sometimes is also used to test the
differences between the two or more observed data.Its value can be calculated by using the given observed frequency and
expected frequency.
The Chi Square is denoted by X2and the formula is given as:

Here,

O = Observed frequency
E = Expected frequency

= Summation
X2 = Chi Square value

Chi Square Test Problems


Back to Top

Below are the problems based on the Chi Square test:

Solved Examples
Question 1: Calculate the chi-square value, if observed frequency is 8 and expected frequency is 15?
Solution:
Given:
Observed frequency = 8
Expected frequency = 15

X2 = (OE)2E
X2 = (815)215
X2 = (7)215
X2 = 4915
X2 = 3.267
Chi Square value = 3.267

Question 2: Calculate the chi-square value for the following data.

Color

Red

Green

Yellow

Observed frequency

12

16

20

Expected frequency

16

25

Solution:
Lets find the Chi Square value for the given data using the formula :

X2 = (OE)2E
First, lets calculate (OE)2 for each color.
Red color = (OE)2 = (1216)2 = 16
Green color = (OE)2 = (168)2 = 64

Yellow color = (OE)2 = (2015)2 =15


Chi-Square value for Red color = (1216)216 = 1
Chi-Square value for Green color = (168)28 = 8
Chi-Square value for Yellow color = (2015)215 = 1
So, the chi-square value for the given data is = 1 + 8 + 1 = 10
Chi Square value = 10

Z Test Formula
Z Test is a concept of statistics which compares means of two populations. Z test assumes normal distribution under null
hypothesis. Z test is performed on a large number of data or on a population data. On the other hand, for a small data or
sample data, T test is performed. The score determined by Z test is called "Z score". Z score can be approximated when

population standard deviation of a large data is given. Z test uses an assumed value which is generally within the limits of
given data to calculate Z score. This value is known as "standardized random variable".
The formula for calculating Z score is given below:

Where,
x = Standardized random variable

x = Mean of the data


= Population standard deviation.
The formula for population standard deviation is given below:

Where,
= Population standard deviation
xi = Numbers given in the data

x = Mean of the data


n = Total number of items.

Z Test Problems
Back to Top

Few problems based on Z test are given below:

Solved Examples
Question 1: In a government organization, the mean basic salary of the employees is INR 5000. How much will be the z
score of employees whose basic salary is INR 3000, if standard deviation of the population is 850?
Solution:
Standardized random variable x = 3000
Mean x = 5000
Population standard deviation = 850
Formula for Z score is given below:

Z score = xx
Z score = 30005000850
= -2.353

Question 2: The marks obtained in mathematics exam by students of a class vary from 33 to 100. If average marks are 62
and standard deviation is 20. Find the Z score for students who scored 80?
Solution:

Standardized random variable x = 80


Mean of the data x = 62
Standard deviation = 20

Z score = xx
Z score = 806220
= 0.9

F Test Formula
F Test is a method to compare variance of two different set of values. F test is applied on F distribution under null
hypothesis. For calculating F test value, we first find the mean of two given observations and then calculate their variance.
F test value is expressed as the ratio of variances of two observations. The comparison between the variances of two sets of
data can lead to many predictions. The formula for F test is given below:

Variance is given by the following formula:

Where,
2 = Variance
x = Values given in a set of data

x = Mean of the data


n = Total number of values.

F Test Problems
Back to Top

Few problems based on F test are given below:

Solved Examples
Question 1: Find the F value for the following two observations:
1, 3, 5, 7, 9 and 5 ,9, 3, 8, 3?
Solution:
Formula for mean is given by:

x = xn
Formula for variance is given by:

2 = (xx)2n1
Calculation for first set of values:

x1 = 5

x1

x1x1

(x1x1)2

-4

16

-2

16
(x1x1)2 = 40

12 = 10
Calculation for second set of values:

x2 = 5

x2

x2x2

(x2x)2

0.6

0.36

3.4

11.56

-2.6

6.76

2.4

5.76

-2.6

6.76
(x1x1)2 =31.2

22 = 7.8

F value = 2122
F value = 107.8 = 1.282

Question 2: The following two sets of data are given:


4, 2, 5, 1, 3 and 8, 3, 9, 0, 1.
Calculate the F value.
Solution:
Formula for mean is given by:

x = xn
Formula for variance is given by:

2 = (xx)2n1
Calculation for first set of values:

x1 =3

x1

x1x1

(x1x1)2

-1

-2

(x1x1)2 = 10
12 = 2.5
Calculation for second set of values:

x2 = 4.2

x2

x2x2

(x2x)2

3.8

14.44

-1.2

1.44

4.8

23.04

-4.2

17.64

-3.2

10.24
(x1x1)2 =66.8

22 = 16.7

F value = 2122
F value = 2.516.7
= 0.1497 = 0.15 (approx)

T Test Formula
T Test is often called Student's T test in the name of its founder "Student". T test is used to compare two different set of
values. It is generally performed on a small set of data. T test is generally applied to normal distribution which has a small
set of values. This test compares the mean of two samples. T test uses means and standard deviations of two samples to
make a comparison. The formula for T test is given below:

Where,

x1 = Mean of first set of values


x2 = Mean of second set of values
S1 = Standard deviation of first set of values
S2 = Standard deviation of second set of values
n1 = Total number of values in first set
n2 = Total number of values in second set.
The formula for standard deviation is given by:

Where,
x = Values given

x = Mean
n = Total number of values.

T Test Problems
Back to Top

Few problems based on T test are given below:

Solved Examples
Question 1: Find the t-test value for the following two sets of values:
7, 2, 9, 8 and 1, 2, 3, 4?

Solution:
Formula for mean:

x = xn
Formula for standard deviation:

S=(xx)2n1
Calculation for first set:
Number of terms in first set:
n1 = 4
Mean for first set of data:

x1 = 6.5
Construct the following table for standard deviation:

x1

x1x1

(x1x1)2

0.5

0.25

-4.5

20.25

2.5

6.25

1.5

2.25
(x1x1)2 = 29

Standard deviation for first set of data:


S1 = 3.11
Calculation for second set:
Number of terms in second set:
n2 = 4
Mean for second set of data:

x2 = 2.5
Construct the following table for standard deviation:

x2

x2x2

(x2x2)2

-1.5

2.25

-0.5

0.25

0.5

0.25

1.5

2.25
(x2x2)2 = 5

Standard deviation for first set of data:


S2 = 1.29

Formula for t-test value:

t = x1x2S21n1+S22n2
t = 6.52.59.6674+1.6674
t = 2.3764 = 2.38 (approx)

Question 2: Find the t-test value for the following two sets of data:

x1

10

11

12

x2

Solution:
Formula for mean:

x = xn
Formula for standard deviation:

S = (xx)2n1
Calculation for first set:
Number of terms in first set:
n1 = 4
Mean for first set of data:

x1 = 10.5
Construct the following table for standard deviation:

x1

x1x1

(x1x1)2

-1.5

2.25

10

-0.5

0.25

11

0.5

0.25

12

1.5

2.25
(x1x1)2 = 5

Standard deviation for first set of data:


S1 = 1.291
Calculation for second set:
Number of terms in second set:
n2 = 4
Mean for second set of data:

x2 = 5
Construct the following table for standard deviation:

x2

x2x2

(x2x2)2

-3

-1

9
(x2x2)2 = 20

Standard deviation for first set of data:


S2 = 2.582
Formula for t-test value:

t = x1x2S21n1+S22n2
t = 10.551.6674+6.6674
t = 3.8105 = 3.81 (approx)

Linear Regression Formula


Regression is a concept of statistics which is used to find the relationship between two variables or two sets of data. It is
used to compare the sets of data. Linear regression is the process of finding the linear equation which best fits with the two
sets of data points. One variable is dependent variable, while other is the independent variable. Independent variable is also
called explanatory variable. Formula for linear regression equation is given by:
a and b are given by the following formulas:

where,
X and Y are two variables on regression line.

b = Slope of the line.


a = y-intercept of the line.
x = Values of first data set.
y = Values of second data set.

Linear Regression Problems


Back to Top

Few problems based on linear regression are given below:

Solved Examples
Question 1: Find linear regression equation for the following two sets of data:

10

Solution:
Construct the following table:

x2

xy

16

28

36

30

10

64

80

x = 20

y = 25

x2 = 120

xy = 144

b = nxy(x)(y)nx2(x)2
b = 414420254120400
b = 0.95

a = yb(x)n
a = 260.95204
a = 1.5
Linear regression is given by:
y = a + bx
y = 1.5 + 0.95x

Question 2: Find the linear regression of the following sets of data:

Solution:
Construct the following table:

x2

xy

16

16

12

x = 10

y = 20

x2 = 30

xy = 43

b = nxy(x)(y)nx2(x)2
b = 4431020430100
b = -1.4

a = yb(x)n
a = 20+1.4104
a = 8.5
Linear regression is given by:
y = a + bx
y = 8.5 - 1.4x

FormulasMath FormulasStatistics FormulasRelative Frequency Formula

Top

Relative Frequency Formula


In statistics, sometimes few numbers in the given data repeat a number of times. This repetition of a particular number is
called its frequency. The frequency of an item represents how many times that item is appearing in a data. On the other
hand, relative frequency of a number is its frequency as compared to the total frequencies of all the numbers. Relative
frequency is evaluated by dividing the individual frequency of an item by total number of frequencies. The formula for the
relative frequency is given below

Where,
f = Frequency of an individual item
n = Total frequencies.

Relative Frequency Problems


Back to Top

Few problems based on relative frequency are as follows:

Solved Examples
Question 1: Construct the relative frequency table for the following data:
2, 2, 7, 4, 5, 1, 5, 7, 9, 0, 1, 3, 2, 5, 9, 6, 2, 3, 7, 0

Solution:

Formula for relative frequency:


Relative frequency = fn
The table of relative frequency is given below:

Relative frequency = fn

220 = 0.1

220 = 0.1

420 = 0.2

220 = 0.2

120 = 0.05

320 = 0.15

120 = 0.1

320 = 0.15

220 = 0.1

n = 20

Question 2: Find the relative frequency of the following data:

Class Interval

Frequency

0-10

10-20

20-30

10

30-40

15

40-50

Solution:

Formula for relative frequency:


Relative frequency = fn
The table of relative frequency is given below:

Class Interval

Frequency

Relative frequency = fn

0-10

545 = 0.111

10-20

845 = 0.178

20-30

10

1045 = 0.222

30-40

15

1545 = 0.333

40-50

745 = 0.156

n = 45

Margin of Error Formula


The margin of error is the statistical concept which expresses the certainty and uncertainty between the true and an
estimated parameter.
The accuracy between the true and estimated parameter are considered on the size of the margin of error. If the margin of
error is small, it indicates that the results are trustworthy and if the margin of error is larger, its says that the results are far
from the accuracy.
In simple words, margin of error is the product of critical value and the standard deviation.Critical value is expressed
in Z score.
The margin of error is denoted by E and the formula is given as,

Here,
$z$ $(\frac{\alpha

}{2})$ = represents the critical value.


$z$ $(\frac{\sigma }{\sqrt{n}})$ = represents the standard deviation.
Margin of Error Problems
Back to Top

Below are few problems based on Margin of Error:

Solved Examples
Question 1: A random sample of 30 students has an average yearly earnings of 2450 and a standard deviation of 587. Find
the margin of error if c = 0.95?
Solution:
Given n=30
Standard Deviation= 587
z $\frac{\alpha}{2}$ = 1.96
$\therefore$ by using the formula E= z $(\frac{\alpha
= 1.96

$(\frac{587}{\sqrt{30}})$

= 210.06
= 210 approximately
The margin of error = 210

}{2})(\frac{\sigma }{\sqrt{n}})$

Question 2: An average monthly earning of 56 employees in a company is 4685 and a standard deviation is 354. Find the
margin of error if c= 0.95?
Solution:
Given n = 56
Standard Deviation= 354
z $\frac{\alpha}{2}$ = 1.96
$\therefore$ by using the formula E = z $(\frac{\alpha
= 1.96 $(\frac{354}{\sqrt{56}})$
= 92.72
= 93 approximately
The margin of error = 93

}{2})(\frac{\sigma }{\sqrt{n}})$

ANOVA
Anova is a statistical test which analyzes variance. It is helpful in making comparison of two or more means which enables a
researcher to draw various results and predictions about two or more sets of data. Anova test includes one-way anova, twoway anova or multiple anova depending upon the type and arrangement of the data. One-way anova has the following test
statistics:

Where,
F = Anova Coefficient
MST = Mean sum of squares due to treatment
MSE = Mean sum of squares due to error.
Formula for MST is given below:

Where,
SST = Sum of squares due to treatment
p = Total number of populations
n = Total number of samples in a population.
Formula for MSE is given below:

Where,
SSE = Sum of squares due to error
S = Standard deviation of the samples
N = Total number of observations.

Few problems based on Anova formula are given below:

Solved Examples
Question 1: Following data is given about cricket teams of three countries:

Countries

Number of Players

Average Runs

Standard Deviations

India

11

60

15

New Zealand

11

50

10

South Africa

11

70

12

Find Anova coefficient?


Solution:
Construct the following table:

Cricket Teams

S2

India

11

60

15

225

New Zealand

11

50

10

100

South Africa

11

70

12

144

n = 11
p=3
N = 33

x = 60+50+703 = 60
SST=n(xx)2
SST=11(6060)2+11(5060)2+11(7060)2
= 2200

MST = SSTp1
MST = 220031
= 1100

SSE=(n1)S2
SSE = 10*225 + 10*100 + 10*144
= 4690

MSE = SSENp
MSE = 4690333
MSE = 156.33

F = MSTMSE
F = 1100156.33
= 7.036

Question 2: The following data is given:

Plant Name

Number of plants

Average Flowers

Standard Deviation

Rose

12

Marigold

16

Lily

20

Calculate the Anova coefficient.


Solution:
Construct the following table:

Plant name

S2

Rose

12

Marigold

16

Lily

20

16

p=3
n=5
N = 15

x = 16
SST = n(xx)2
SST = 5(1216)2+5(1616)2+11(2016)2
= 160

MST = SSTp1
MST = 16031
= 80

SSE = (n1)S2
SSE = 4*4 + 4*1 + 4*16
= 84

MSE = SSENp
MSE = 84153
MSE = 7

F = MSTMSE
F = 807
= 11.429

Pearson Correlation Formula


Correlation is the relationship between two variables. Correlation coefficient is the measurement of
correlation. It indicates how well the two set of data are interconnected. Pearson correlation coefficient
measures the linear dependence of two variables upon each other. It is also referred as Pearson productmoment correlation coefficient. The value of Pearson correlation coefficient lies between -1 to +1. If the
coefficient of correlation is zero, then there is no correlation between given two variables. On the other
hand, the perfectly positive correlation has a value of +1, while a perfectly negative correlation has a
value of -1.
The graphical representation of positive, negative and no correlation is shown below:

Pearson correlation coefficient for sample data is denoted by "r". The formula for Pearson correlation
coefficient r is given by:

Where,
r = Pearson correlation coefficient
x = Values in first set of data
y = Values in second set of data
n = Total number of values.

Pearson Correlation Problems


Back to Top

The few problems based on Pearson correlation coefficient are listed below:

Solved Examples
Question 1: Marks obtained by 5 students in algebra and trigonometry as given below:

Algebra

15

16

12

10

Trigonometry

18

11

10

20

17

Calculate the Pearson correlation coefficient.


Solution:
Construct the following table:

x2

y2

xy

15

18

225

324

270

16

11

256

121

176

12

10

144

100

120

10

20

100

400

200

17

64

289

136

x = 61

y = 76

x2 = 789

y2 = 1234

xy = 902

Formula for Pearson correlation coefficient is given by:


r = n(xy)(x)(y)[nx2(x)2][ny2(y)2]
r = 59026176[5789(61)2][51234(76)2]
r = -0.424

Question 2: Following the values for x and y:

Evaluate Pearson correlation coefficient.


Solution:
Construct the following table:

x2

y2

xy

25

49

35

36

18

16

81

36

64

16

x = 17

y = 27

x2 = 81

y2 = 203

xy = 105

Formula for Pearson correlation coefficient is given by:


r = n(xy)(x)(y)[nx2(x)2][ny2(y)2]
r = 41051727[481(17)2][4203(76)2]
r = -0.724

Coefficient of Determination Formula


Coefficient of Determination is one of the most important tools in statistics which is widely used in data analysis in
economics, physics, chemistry and many more fields. Coefficient of determination allows us to forecast or predict the
possible outcomes and possible variability in the data. Coefficient of determination is denoted by r 2 or sometimes by R2. It is
simply explained as the square of r which is correlation coefficient. The value of coefficient of determination lies between 0
and 1. The higher the value of r2, the better the prediction becomes. The formula for coefficient of determination is given
below:

The formula of correlation coefficient is given below:

Where,
r = Correlation coefficient
x = Values in first set of data
y = Values in second set of data
n = Total number of values.

Coefficient of Determination Problems

Back to Top

The few problems based on coefficient of determination are as follows:

Solved Examples
Question 1: Marks obtained by few students in physics and chemistry tests are given by the following table:

Physics

18

16

15

10

Chemistry

15

12

17

Compute the coefficient of determination.


Solution:
Construct the following table for the determination of correlation coefficient:

x2

y2

xy

18

15

324

225

270

16

12

256

144

192

15

225

81

135

10

17

100

289

170

x = 59

y = 53

x2 = 905

y2 = 739

xy = 767

Formula for correlation coefficient:

r = n(xy)(x)(y)[nx2(x)2][ny2(y)2]
r = 47675953[4905(59)2][4739(53)2]
r = 0.41275
Coefficient of determination = r2
= 0.17036 = 0.17 (approx).

Question 2: During an observation at a garden, the number of roses and the number of marigold flowers are noted every
week. The readings of 4 successive weeks are as follows:

Roses

Marigold

Evaluate the coefficient of determination.


Solution:

3
5

Construct the following table for the determination of correlation coefficient:

x2

y2

xy

25

15

16

36

24

x = 10

y = 16

x2 = 30

y2 = 74

xy = 46

Formula for correlation coefficient:

r = n(xy)(x)(y)[nx2(x)2][ny2(y)2]
r = 4461016[430(10)2][474(16)2]
r = 0.8485
Coefficient of determination = r2
= 0.7199 = 0.72 (approx).

Variance Formula
Variance is one of the most important concepts of statistics. It is the measure of dispersion of a set of data. It indicates how
far the different values of a set of data are spread over. Variance is the measurement of deflection of values from its mean.
Variance is the average of squared differences of each value from mean of the data.
There is one difference between variance and standard deviation that standard deviation is defined as the square root of
variance. Variance is denoted by square of a Greek letter sigma ($\sigma ^{2}$). The formulas for variance are given below.
Variance formula for population data is as follows:

Variance formula for sample data is as follows:

Where,
$\sigma ^{2}$ = Variance
x = Item given in the data
$\bar{x}$ = Mean of the data
n = Total number of items.

Variance Problems
Back to Top

Few problems based on variance formula are given below:

Solved Examples
Question 1: Marks obtained by few students are: 75, 83, 54, 90, 61. Find the variance of the sample?
Solution:
Formula for mean:
$\bar{x}$ = $\frac{\sum

x}{n}$
$\bar{x}$ = $\frac{363}{5}$ = 72.6
Construct the following table:

$x-\bar{x}$

$(x-\bar{x})^{2}$

75

2.4

5.76

83

10.4

108.16

54

-18.6

345.96

90

17.4

302.76

61

-11.6

134.56

$\sum x$ = 363

$\sum (x-\bar{x})^{2}$ = 897.2

Formula for variance is given by:


$\sigma ^{2}$ = $\frac{\sum

(x-\bar{x})^{2}}{n-1}$

$\sigma ^{2}$ = $\frac{897.2}{4}$


= 224.3

Question 2: During a survey, the number of members in the family were asked. Few answers of them are 5, 6, 2, 3, 1, 7, 4,
8. Calculate the variance.
Solution:

Formula for mean:


$\bar{x}=\frac{\sum x}{n}$
$\bar{x}$ = $\frac{36}{8}$ = 4.5
Construct the following table:

$x-\bar{x}$

$(x-\bar{x})^{2}$

0.5

0.25

1.5

2.25

-2.5

6.25

-1.5

2.25

-3.5

12.25

2.5

6.25

-0.5

0.25

3.5

12.25

$\sum x$ =36

$\sum (x-\bar{x})^{2}$ = 42

Formula for sample variance is given by:


$\sigma ^{2}$ = $\frac{\sum

(x-\bar{x})^{2}}{n-1}$
$\sigma ^{2}$ = $\frac{42}{7}$
=6

Correlation Coefficient Formula


The correlation co-efficient, is the measure of the strength of two random variables.The correlation coefficient values ranges
between +1 to -1. When the correlation coefficient value comes out be positive, it represents the similar and identical relation
between the two values and the negative value represents the dissimilarity between the two values.
In other words,it is the way to find out how well the predicted ideas are matching with the real life data.
The correlation co-efficient is denoted by r.

If x and y are the two variables, then the formula for correlation coefficient is given as:

Here,
n= Number of data.
x= Sum of first data list.
y= Sum of second data list.
xy =Sum of the product of 1st and 2nd value.
x2 = Sum of the squares of the 1st value.
y2 = Sum of the squares of the 2nd value.

Correlation Coefficient Problems


Back to Top

Below are the few example problems based on correlation coefficient:

Solved Examples
Question 1: Calculate the Correlation Coefficient for the below table:

x values

y values

60

3.1

61

3.6

62

3.8

63

Solution:
Given n= 4
Firstly, lets calculate xy, x2 and y2

x values

y values

xy

y2

3600

9.61

60

3.1

61

3.6

219.6

3721

12.96

62

3.8

235.6

3844

14.44

63

252

3969

16

Now, lets find the summation of all these values


So,x = 60 + 61 + 62 + 63 = 246
y = 3.1 + 3.6 + 3.8 + 4 = 14.5
xy = 186 + 219.6 + 235.6 + 252 = 893.2

186

x2

x2= 3600 + 3721 + 3844 + 3969 = 15134


y2= 9.61 + 12.96 + 14.44 + 16 = 53.01

r= n(xy)(x)(y)[nx2(x)2][ny2(y)2]
= 4(893.2)(246)(14.5)[4(15134)(246)2][4(53.01)(14.5)2]
Correlation Coefficient (r)= 0.96936

Question 2: Suppose there are two test scores:

x value

y value

100

28

106

33

112

26

98

27

87

24

77

24

67

21

66

26

49

22

Find the correlation coefficient?


Solution:
Given n = 9
Firstly, lets calculate xy, x2 and y2

x value

y value

xy

x2

y2

100

28

2800

10000

784

106

33

3498

11236

1089

112

26

2912

12455

676

98

27

2646

9604

729

87

24

2088

7569

576

77

24

1848

5929

576

67

21

1407

4489

441

66

26

1416

4356

676

49

22

1078

2401

484

Now, lets find the summation of all these values


So,x = 100 + 106 + 112 + 98 + 87 + 77 + 67 + 66 + 49 = 762
y = 28 + 33 + 26 + 27 + 24 + 24 + 21 + 26 + 22 = 231
xy = 2800 + 3498 + 2912 + 2646 + 2088 + 1848 + 1407 + 1416 + 1078 = 19693
x2= 10000 + 11236 + 12455 + 9604 + 7569 + 5929 + 4489 + 4356 + 2401 = 68039
y2= 784 + 1089 + 676 + 729 + 576 + 576 + 441 + 676 + 484= 6031

r= n(xy)(x)(y)[nx2(x)2][ny2(y)2]
= 9(19693)(762)(231)[9(68039)(762)2][9(6031)(231)2]
Correlation Coefficient (r) = 0.716664

Quartile Formula
Just like the median divides the set of observation into two equal parts when arranged in the numerical order, in the same
way quartile divides the set of observation into 4 equal parts.
The value of the middle term, between the first term and median is known as first or Lower Quartile and is denoted as Q1.
The value of middle term between the last term and the median is known as third or Upper Quartile and is denoted
as Q3.The median itself is known as the Second Quartile and is denoted as Q2
When the set of observation is arranged in an ascending order, then the Lower quartile is given as:

If the solution is a decimal number then, Lower quartile Q1 is given by rounding to the nearest whole interger.
The Second quartile, which is the median of the set of observation is given as:

The Upper quartile is given as,

If the solution is a decimal number then, Upper quartile Q3 is given by rounding to the nearest whole interger.

The lower and the upper quartile value helps us to find the measure of dispersion in the set of observation, which is called
as 'inter-quartile range', it is denoted as IQR and it is the difference between upper and lower quartile.

Quartile Problems
Back to Top

Below are the problems based on Quartile:

Solved Examples

Question 1: Find the median, lower quartile, upper quartile and inter-quartile range of the following data set of scores: 19,
22, 24, 20, 24, 27, 25, 24, 30?
Solution:
First, lets arrange of the values in an ascending order:
19, 20, 22, 24, 24, 24, 25, 27, 30
Now lets calculate the Median,
Median = $\left(\frac{n + 1}{2} \right)^{th}$ term
= $\left(\frac{9 + 1}{2} \right)^{th}$ term
= $5^{th}term$
= 24
Lower quartile = $\left(\frac{n + 1}{4} \right)^{th}$ term
= $\left(\frac{9 + 1}{4} \right)^{th}$ term
= $\left(\frac{10}{4} \right)^{th}$ term
= $2.5^{th}$
Find the average of 2nd and 3rd term
= $\frac{20

+ 22}{2}$
= $\frac{42}{2}$
= 21
Upper quartile = $\left(\frac{3(n + 1)}{4} \right)^{th}$
= $\left(\frac{3(9 + 1)}{4} \right)^{th}$
= $\left(\frac{3(10)}{4} \right)^{th}$
= $\left(\frac{30}{4} \right)^{th}$
= $7.5^{th}$
(lets find the average of 7th and 8th term)
= $\frac{25

+ 27}{2}$
= $\frac{52}{2}$
= 26
Inter - quartile= Upper quartile - lower quartile
= 26 - 21
=5

Question 2: Find the first quartile, second quartile and third quartile of the given information of the following sequence 4, 77,
16, 59, 93, 88?
Solution:
First, lets arrange of the values in an ascending order:
4, 16, 59, 77, 88, 93
Given n = 6
$\therefore$ Lower quartile = $\left(\frac{n + 1}{4} \right)^{th}$ term
= $\left(\frac{6 + 1}{4} \right)^{th}$ term
= $\left(\frac{7}{4} \right)^{th}$ term

= 1.7th term
Here we can consider the 2nd term (rounding 1.7 to nearest whole integer) from the set of observation.
$\Rightarrow$ 2nd term = 16
Lower quartile = 16
Upper quartile = $\left(\frac{3(n + 1)}{4} \right)^{th}$ term
= $\left(\frac{3(6 + 1)}{4} \right)^{th}$ term
= $\left(\frac{21}{4} \right)^{th}$ term
= $5.25^{th}$
Here we can consider the 5thterm (rounding 5.25 to nearest whole integer) from the set of observation.
$\Rightarrow$ $5.25^{th}$ = 88
Upper quartile= 88
Inter-quartile= Upper quartile - lower quartile
= 88 - 16
= 72

tandard Deviation Formula


Standard Deviation represents the deviation of the values of a set of data from its average or mean. It shows how the
different values of a particular data set are dispersed. When standard deviation is lower, it means that the values are very
close to their average. On the other hand, when standard deviation is higher, it means that the values are scattered far from
the average value. Standard deviation of a data set is square root of its variance. The value of standard deviation can never
be negative or less than zero. There are two types of standard deviations: population standard deviation and sample
standard deviation. The formula for population standard deviation is given below:

Sometimes, only a sample of whole population is given. In this case instead of calculating population standard deviation, we
calculate sample standard deviation. The formula for sample standard deviation is given below:

Where,
xi = Terms given in the data

x = Mean
n = Total number of terms.

Standard Deviation Problems


Back to Top

Few problems based on standard deviation are given below:

Solved Examples
Question 1: During a survey, 6 students were asked that how many hours per day they study on an average? Their answers
were as follows: 2, 6, 5, 3, 4, 1. Evaluate the standard deviation.
Solution:
Formula for mean is given by:

x = xin
x = 2+6+5+3+4+16
= 3.5
Construct the following table for standard deviation:

xi

xix

(xix)2

-1.5

2.25

2.5

6.25

1.5

2.25

-0.5

0.25

0.5

0.25

-2.5

6.25
(xix)2 = 17.5

Formula for standard deviation is given by:

S = ni=1(xix)2n
S=17.56
S = 2.92 = 1.71

Question 2: Marks obtained by 4 students in a class are 25, 15, 20, 18. Find the standard deviation of the sample?
Solution:
Formula for mean is given by:

x = ni=1xin
x = 25+15+20+184
= 19.5
Construct the following table for standard deviation:

xi

xix

(xix)2

25

5.5

30.25

15

-4.5

20.25

20

0.5

0.25

18

-1.5

2.25
(xix)2 = 53

Formula for standard deviation is given by:

S = ni=1(xix)2n1

S=533
S = 4.2

Das könnte Ihnen auch gefallen