Beruflich Dokumente
Kultur Dokumente
Nonparametric
Analysis, Volume I
A Primer on
Nonparametric
Analysis, Volume I
Shahdad Naghshpour
To Joe
SN
Abstract
Nonparametric statistics provide a scientific methodology for cases where
customary statistics are not applicable. Nonparametric statistics are used
when the requirements for parametric analysis fail, such as when data are
not normally distributed or the sample size is too small. The method
provides an alternative for such cases and is often nearly as powerful as
parametric statistics. Another advantage of nonparametric statistics is that
it offers analytical methods that are not available otherwise.
In social sciences, often, it is not possible to obtain measurements,
which renders customary analysis impossible. For example, it is not possible to measure utility but is possible to rank preference, which is based
on the unmeasurable utility. Nonparametric methods provide theoretically valid options for analysis, making the use of unscientific methods
unnecessary.
Nonparametric methods are intuitive and simple to comprehend,
which helps researchers in the social sciences understand the methods in
spite of lacking mathematical rigor needed in analytical methods customarily used in science. The only prerequisite for this book is high
school level elementary algebra.
This book is a methodology book and bypasses theoretical proofs
while providing comprehensive explanations of the logic behind the
methods and ample examples, which are all solved using direct computations as well as by using Stata.
The book is arranged into two integrated volumes. Although each
volume, and for that matter each chapter, can be used separately, it is
advisable to read as much of both volumes as possible; because familiarity
with what is applicable for different problems will enhance capabilities.
It is recommended that everyone read the Introduction and Chapter 1
because determining whether data are random or normally distributed is
essential in the selection of parametric versus nonparametric methods.
Keywords
Nonparametric statistics, median, order statistics, rank, one sample, two
samples, several samples, multiple comparison, normality, skewness.
Contents
Acknowledgments .................................................................................. xi
Introduction ....................................................................................... xiii
SECTION I
Acknowledgments
This manuscript could not have been completed without the great assistance provided by Madeline Messick. She is a meticulous researcher
with a keen eye for details. Her dedicated hard work and careful contribution have improved the manuscript. Any shortcomings and imperfections are my responsibilities. I cannot thank Madeline enough for her
contributions.
Introduction
The term nonparametric statistical analysis refers to methods that do not
assume any particular distribution function for the population from
which samples are obtained. This does not mean that nonparametric
methods do not use parameters. In fact, many of the inferences involve
the median, which is a parameter about the central tendencies of the
data. Sometimes, nonparametric methods are called distribution-free
methods, in the sense that their outcomes do not depend on any particular distribution, such as the normal distribution function. However,
more often than not, it indicates that the outcome is valid under different distribution functions or that the inference is valid even if one or
more of the prerequisite conditions of a statistical technique are violated.
Technically, nonparametric and distribution-free methods are not
necessarily identical or have the same concept, but the nuance is too
technical to be of use for the purposes of this manuscript. When a test
does not critically depend on prerequisite conditions, it is called a robust
test. In this sense, all nonparametric tests are robust because they do not
depend on any particular distribution function or parameters; in other
words, they are valid under numerous distributions. However, strictly
speaking, the term robust describes a parametric test based on an assumption of normality that is valid when the population deviates from a
normal distribution function or the results are still valid in spite of violations of some of the theoretical requirements.
Definition I.1
A method of analysis is said to be robust with regard to an assumption
when the result remains valid even if the theoretical requirements are
not strictly met.
The advantage of robust tests is their applicability under diverse
conditions. In addition to wide applicability, it is important for a test to
be powerful. The power of a test is its ability to reject the null hypothesis
in favor of a specific alternative hypothesis when the null hypothesis is
xiv
INTRODUCTION
Definition I.2
The power of a test is the probability of rejecting the null hypothesis
when the null hypothesis is false.
The distinction between and is that the latter is the probability of rejecting the null hypothesis when it is true, whereas the former
represents the probability of failing to reject the null hypothesis when it
is false.
Definition I.3
Type I Error occurs when the null hypothesis is true and it is rejected.
Definition I.4
Type II Error occurs when the null hypothesis is false but is not rejected.
Table I.1 Summary of types of error in inference
H0 is true
H1 is true
H0 rejected
H0 not rejected
Type I error
No error
No error
Type II error
INTRODUCTION
xv
Parametric Statistics
Before continuing the discussion of nonparametric analysis, it is helpful
to better understand parametric statistics, the requirements of parametric analysis, and the relevant terminology. The term statistics has three
distinct meanings. First, it represents a collection of data; second, it refers to the processes that are used to analyze data. The processes include
descriptive statistics, such as the mean, median, mode, range, variance,
and other measures that condense and summarize data. Statistical analysis also includes the process called inferential statistics, or simply statistics,
which utilizes data obtained from a sample to draw inferences about a
population. Third, the term statistics refers to the field of study that
deals with collecting, analyzing, and explaining data.
Definition I.5
A population is the collection of all the elements under study.
Definition I.6
A sample is a selection of some members of a population. In statistics,
samples are collected at random, with few exceptions.
xvi
INTRODUCTION
Definition I.7
A parameter is a characteristic of a population that is of interest. Parameters are constant and usually unknown.
Definition I.8
A statistic is a numeric fact or summary obtained from a sample. It is
always known and is a variable.
All parametric statistics require a probability distribution function.1
Often, the behavior of an event can be approximated by a particular
probability distribution function; for example, countable random events
such as the number of accidents or phone calls during a given period can
be reasonably approximated by a Poisson distribution function. The
distribution function of many sample statistics, such as the mean, can be
approximated using a normal distribution function based on the Central
Limit Theorem (CLT) (see Theorem I.1).
Parametric statistics require that the sample points be independently
and identically distributed and that the sample size be large enough.
Preferably, samples used in parametric research deal with interval or
ratio data (see Definitions I.9 and I.10), and the variances of their respective populations are of comparable magnitude.
INTRODUCTION
xvii
In this book, we adopt the convention of using Greek letters to represent parameters. Sample statistics are represented by the Greek letter
of the corresponding parameter with a hat (^), indicating they are estimates. For example, the letter (mu) is used for the population parameter mean, and is used to represent the sample mean.
2
N
The CLT indicates that for a large enough sample, the distribution
function of several sample statistics complies with the normal distribution or distribution functions derived from normal distribution such as t
and F. This demonstrates the importance of random sampling in ensuring the validity of the CLT. Because in many instances statistical analysis pertains to sample statistics such as means and variance, it is often
possible to use the CLT to justify the use of a normal distribution function and other statistical functions that are derived from it.
Measurement Scales
Because different nonparametric tests apply to different types of data, it
is necessary to define measurement scales of data first.
Definition I.9
xviii
INTRODUCTION
A ratio scale has a natural zero. For example, sales, gross domestic
product (GDP), and output are expressed as ratio scales and all have a
meaningful and natural zero; when nothing is produced GDP is 0. The
ratio scale has all the properties of the other measurement scales.
Definition I.10
Definition I.11
An ordinal scale indicates that data are ordered based on some characteristic of the data.
Although orders or ranks are represented by numerical values, such
values are void of content and cannot be used for typical computations
such as averages. The distances between ranks are meaningless as well. The
income of the person who is ranked 20th in a group of ordered incomes is
not necessarily twice the income of someone who is ranked 40th. In an
ordinal scale, only the comparisons greater, equal, or less are meaningful. Ordinal scales are very important in economics, as in the case of
utility and indifference curves. It is not necessary to measure the amount
of utility a consumer receives from different goods and services; it is sufficient to rank the consumers utility. The customary arithmetic computations and statistical methods do not apply to ordinal numbers.
INTRODUCTION
xix
Definition I.12
A Likert scale is a special type of ordinal scale, where the subjects provide
the ranking of each variable.
An odd number of choices are typically used in Likert scales to allow
the center value to represent the neutral case. For example, a subject is
asked to rate his or her preference on a scale of 1, very low, to 5, very
high. In this case, a choice of 3 would represent a neutral response, indicating no preference.
Definition I.13
Nominal or categorical data classify objects and observations by naming
them.
The values assigned to nominal outcomes are merely a name or representation of the outcome. Countries might be grouped according to
their policy toward trade and classified as open or closed economies.
Care must be taken to ensure that each case belongs to only one group.
An ID number is another example of nominal data. Because the size of
nominal data cannot be measured, the customary arithmetic computations and statistical methods do not apply to these numbers. Assigning
numerical values to categorical data is merely for convenience, and the
assigned values are void of any computational capabilities. Sometimes,
nominal data are used to count the outcomes such as the number of
quarters with recession. When used for this purpose, they are also
known as count data, categorical data, or frequency data.
Definition I.14
Dichotomous data only have two outcomes.
When there are only two nominal outcomes, such as yes/no,
young/old, or male/female, the data is dichotomous. Dichotomous variables are also known as dummy variables in econometrics. When there is no
particular order, a dichotomous variable is called a discrete dichotomous
variable. Gender is an example of a discrete dichotomous variable. Alternatively, when one can place an order on dichotomous data, as in the case
of young and old, then the variable is a continuous dichotomous variable.
xx
INTRODUCTION
Definition I.15
The relative efficiency of hypothesis test is the ratio of the sample sizes required to achieve the same power (1 ).
Unless there is any ambiguity, we will refer to the relative efficiency
of hypothesis as the efficiency or relative efficiency. Usually, the relative
efficiency is based on large samples and is known as the asymptotic relative efficiency (ARE) or Pitman efficiency. Relative efficiency depends on
the null (H0) and alternative (H1) hypotheses, and the type I and type II
levels and . The relative efficiency of test A to test B is (nB /nA ),
where the numerator and the denominator are the sample sizes of B and
A, respectively, required to achieve the same power ( ) for a given level
of type I error ( ) .
Ibid.
INTRODUCTION
xxi
Normality is usually a main requirement for parametric analysis. Distributions may deviate from a normal distribution in a number of ways
such as asymmetry or degree of peakedness. A distribution function that is
xxii
INTRODUCTION
Definition I.16
Skewness is a measure of asymmetry from the mean as compared to the
normal distribution.
Definition I.17
Kurtosis is a measure of how peaked a distribution is. It reflects the
pointedness or flatness of a symmetric distribution.
A distribution more pointed than the normal distribution is called
leptokurtic and it has a positive kurtosis, whereas a negative value for kurtosis indicates the distribution is flatter than a normal distribution and is
called platykurtic. Kurtosis and skewness are used to test whether a data
set follows a normal distribution (Figures I.3 and I.4).
INTRODUCTION
xxiii
The command summarize in Stata when invoked with the option detail
provides the values for skewness and kurtosis in conjunction with several
other descriptive statistics. To test the significance of the values for
skewness and kurtosis, it is necessary to use the Stata command sktest.
summarize varlist, detail
sktest varlist
where varlist is the names of all the variables for which the test is desired. Failure to provide variable names will result in an error message.
When more than one variable is listed, the results for each variable are
displayed on different lines. The output consists of the number of observations, p values for skewness and kurtosis, and a chi-squared test of
joint skewness and kurtosis hypothesis. In all cases, the null hypothesis is
that the underlying distribution is normal. Because a distribution can be
skewed without being pointed or flat and vice versa, it is necessary to test
both skewness and kurtosis jointly if the objective is to test for normality, otherwise the actual p value will be higher than the reported level of
significance for each separate hypothesis. This issue is discussed in
Chapter 1 of Volume II under Bonferroni adjustment. Stata warns
against using sktest for testing normality. Instead, it recommends using
the ShapiroWilk command (swilk) to test normality, and once the
hypothesis is rejected, it recommends using sktest to determine whether
skewness, kurtosis, or both are the cause.
xxiv
INTRODUCTION
Nonparametric Statistics
Nonparametric statistics are not limited by the requirements of parametric statistics, such as normality, existence of a mean or variance, or symmetry, to afford statistical inference. Most nonparametric statistics applications are not limited to interval or ratio scales. There is no common
parametric distribution for the sample median as is the case for the sample mean. However, there are several nonparametric inferences for median and other order statistics. Order statistics are represented by letters X or
Y, and superscripts are used to indicate their order, such as 1, 2, 3. and
in general as X(i). Order statistics are obtained when the sample points are
arranged in ascending order. For example, percentiles are order statistics
because they are in ascending order.
There are nonparametric counterparts for many of the parametric
inferential procedures. Nonparametric methods are also useful for testing the requirements of parametric inference such as normality and randomness. Therefore, it is often wise to test for normality and randomness of the response variable using nonparametric inference before attempting to conduct any parametric inference requiring normality, randomness, or symmetry.
INTRODUCTION
xxv
Statistics software such as Stata reports the p value for every test. The
advantage of using p values is that when the p value is small enough, the
null hypothesis is rejected regardless of whether the method is parametric or nonparametric. When using tables instead of p values, the easiest
approach might be to obtain the tabulated value for a pre-specified level
of significance and compare the test statistic with it. For nonparametric
inferences, care should be taken to note whether the test statistic should
exceed or be less than the tabulated value in order to reject the null hypothesis, whereas in parametric inference, the null hypothesis is always
rejected when the calculated statistic is more extreme than the tabulated
value.
Properties of Estimators
xxvi
INTRODUCTION
If the expected value of a point estimate equals the population parameter, then the estimate is unbiased. In symbols:
E () =
(I.1)
E ( ) =
(I.2)
E (2 ) = 2
(I.3)
E ( ) =
(I.4)
2 =
(I.5)
where n is the sample size. The variance of the sample mean decreases as
the sample size increases. Because the population variance is a parameter, it is also a constant; therefore, as the sample size (n) increases, the
ratio decreases. In addition, because the sample mean is an unbiased
estimate of the true population mean, it will get closer and closer to the
population mean as the sample size increases.
1 is more efficient than 2 . For example, when estimating the population mean, the sample mean has a smaller variance than the sample median, and therefore is more efficient than the sample median in estimating
the population mean.
INTRODUCTION
xxvii
Thus, when the sample sizes are the same, the estimator with the
smaller variance is more efficient. Alternatively, when the variances of
two estimators are equal, the one with the smaller sample size is the
more efficient. An implicit requirement is that the level of significance
and the power (of the simple hypothesis) should be the same for valid
comparisons.
Do not confuse this notion of efficiency with the relative efficiency
of a test of hypothesis, which is important when comparing parametric
and nonparametric methods.
In general, nonparametric statistics find approximate solutions to
exact problems, as opposed to the exact solution to approximate problems furnished by parametric statistics.3
Conover, W.J. 1999. Practical Nonparametric Statistics. 3rd ed. New York,
NY: Wiley, p. 2.
SECTION I
One-Sample Tests
There are numerous nonparametric methods for different purposes,
sometimes even with the same name. Customarily, the methods are classified by the number of populations under study and their typea convention used in this book as well. Separate sections will be devoted to
one-sample, two-sample, and K-sample tests, as well as a section for correlation between two variables. Within each section, chapters will be
devoted first to major topics followed by related topics and issues. Some
of the tests can be classified in more than one way. For example, the
goodness of fit test can be classified under one-sample, two-sample, or
chi-squared tests based on the application. Chi-squared distribution
functions are used in many cases when data can be cross-classified as
different groups, especially in contingency table tests.
One-sample tests are concerned with testing claims about a parameter or characteristic of data from a single population. Normality and
randomness tests are examples of a single sample test regarding a statistical characteristic of a single population. The most common parametric
tests for one population are tests about the mean, proportion, and variance.1 Although there is no parametric test based on the median, there
are several nonparametric tests that use the median and other statistics
that depend on the order of data points. One-sample nonparametric
tests are also available for dichotomous or binary data.
CHAPTER 1
Test of Skewness
The normal distribution is the most commonly used distribution function in parametric statistics. Theoretically, the tails of a normal distribution extend from minus infinity to plus infinity. However, in practice,
the majority of observations from normal distributions are within three
standard deviations of the mean. When the mean is a relatively large
positive value, the majority of the observations would be expected to be
positive. Therefore, it is possible to approximate non-negative outcomes
using a normal distribution in spite of the fact that the left tail contains
no negative values. Yet the main reason for widespread use of the normal distribution is the applicability of the CLT, which assures normality
for the distribution of sample statistics such as the sample mean, median,
and proportions; provided that the sample is large enough, sample points
are drawn independently and at random, and are measured using a ratio
or interval scale. Under these requirements, tests of hypotheses can be
conducted using the normal distribution and the distribution functions
that can be derived from it, such as t, F, and chi-squared distribution
functionsexpanding the ability to test other parameters of a population such as the variance. It can also be shown that the same requirements will permit tests of comparisons of two or more means and variances as well.1
Normal distribution functions have two parameters: the mean ()
and the variance (2). A typical normal distribution with mean () and
variance (2) is depicted in Figure 1.1. An approximate distance from
the center (), measured in units of standard deviation (), is marked.
(1.1)
Translating this concept into terms that most readers are familiar
with results in Equation (1.2). When the sample size is small, it is necessary to correct for the sample size using a correction factor such as in
Equation (1.3). It is noteworthy that both the numerator and the denominator include deviations from the mean raised to a power, and thus
are moments about the mean.
( X )
Sk =
Sk =
2
( X )
n
n
(n 1)(n 2)
(1.2)
(1.3)
Example 1.1 Calculate the skewness for the closing prices for Microsoft
stock from May 21 to July 2, 2015.
Table 1.1 Microsoft closing stock prices May 21July 2, 2015 and
deviations from the mean
Date
MSFT
m1
m2
m3
m4
5/21/2015
47.42
1.3237
1.7521
2.3192
3.0698
5/22/2015
46.90
0.8037
0.6459
0.5191
0.4172
5/26/2015
46.59
0.4937
0.2437
0.1203
0.0594
5/27/2015
47.61
1.5137
2.2912
3.4681
5.2495
5/28/2015
47.45
1.3537
1.8324
2.4805
3.3577
5/29/2015
46.86
0.7637
0.5832
0.4454
0.3401
6/1/2015
47.23
1.1337
1.2852
1.457
1.6517
6/2/2015
46.92
0.8237
0.6784
0.5588
0.4603
6/3/2015
46.85
0.7537
0.568
0.4281
0.3226
6/4/2015
46.36
0.2637
0.0695
0.0183
0.0048
6/5/2015
46.14
0.0437
0.0019
8E-05
4E-06
6/8/2015
45.73
0.366
0.1342
0.049
0.018
6/9/2015
45.65
0.446
0.1992
0.089
0.0397
6/10/2015
46.61
0.5137
0.2639
0.1355
0.0696
6/11/2015
46.44
0.3437
0.1181
0.0406
0.0139
6/12/2015
45.97
0.126
0.016
0.002
0.0003
6/15/2015
45.48
0.616
0.3799
0.234
0.1443
6/16/2015
45.83
0.266
0.0709
0.019
0.005
6/17/2015
45.97
0.126
0.016
0.002
0.0003
6/18/2015
46.72
0.6237
0.389
0.2426
0.1513
5E-08
2E-10
6/19/2015
46.10
0.0037
1E-05
6/22/2015
46.23
0.1337
0.0179
0.0024
0.0003
6/23/2015
45.91
0.186
0.0347
0.006
0.0012
6/24/2015
45.64
0.456
0.2082
0.095
0.0434
6/25/2015
45.65
0.446
0.1992
0.089
0.0397
6/26/2015
45.26
0.836
0.6995
0.585
0.4892
6/29/2015
44.37
1.726
2.9802
5.145
8.8818
6/30/2015
44.15
1.946
3.7882
7.373
14.351
7/1/2015
44.45
1.646
2.7104
4.462
7.3463
MSFT
7/2/2015
m1
m2
m3
m4
44.40
1.696
2.8775
4.881
8.2802
Sums
1382.9
0.0000
25.054
10.796
54.808
Mean
46.096
0.0000
0.8351
0.36
1.8269
Solution 1.1
Table 1.1 provides the first, second, third, and fourth powers of the deviation of the values from the estimated mean, which are depicted as m1,
m2, m3, and m4. Therefore, they are the first, second, third, and fourth
moments about the mean. There is close relationship between the moments and the statistical measures of the mean, variance, skewness, and
kurtosis. The equation for skewness (1.2) clearly indicates the presence
of the moments about the mean. The first moment about the mean is
given in Equation (1.4).
m1 = E ( X ) X = 0
(1.4)
And the second moment about the mean is given in Equation (1.5).
m2 = 2
(1.5)
( X )
( 10.79612)
30
n
Sk =
=
3
3
25.05448
)2
(
X
30
S =
k
6n(n 1)
(n 2)(n + 1)(n + 3)
(1.6)
S =
k
6n(n 1)
(n 2 )(n + 1)(n + 3)
6 x 30(30 1)
5,220
=
= 0.18222 = 0.42689
(30 2)(30 + 1)(30 + 3)
28,644
Software Use
Stata calculates skewness using the command
summarize varlist, detail
where varlist is the name(s) of the desired variable(s). Stata uses Equation (1.2) to calculate skewness. Using the correction factor will result in
slightly different results when the sample size is small. The Stata command summarize calculates both skewness and kurtosis in addition to a
host of other descriptive statistics. It is necessary to use the option detail.
Summarize can be shortened as sum.
Example 1.3 Calculate the skewness for the closing prices for Microsoft
stock from May 21 to July 2, 2015 using Stata (Figure 1.4).
Solution 1.3
Hypothesis The null hypothesis is that the data has a normal distribution, which indicates that its skewness is 0. The alternative hypothesis
reflects the researchers claim about whether the data is skewed to the
left, to the right, or that is equally likely to be skewed to one side or the
other. The latter is used when the direction of skewness is not of interest
and only the deviation from symmetry is of concern.
Figure 1.4 Stata command and output for obtaining skewness and
kurtosis
H0 : Sk = 0
H1 : Sk > 0
H1 : Sk < 0
H1 : Sk 0
Test Statistic
The test statistic is simply a Z score or Z statistic as depicted in Equation
(1.7).
Sk 0
S
Z Sk =
(1.7)
where
is the test statistic for testing normality using skewness. The
value 0 in the numerator reflects the fact that the skewness for a normal
distribution is 0, which is the null hypothesis.
Definition 1.3 When sample points or observations are standardized,
the Z score is represented by Equation (1.8).
Z=
Observed Expected
Standard Deviation
(1.8)
10
Z=
Estimated Hypothesized
Standard Error
(1.9)
Example 1.4 Test the hypothesis that the distribution of the closing
prices of Microsoft stock is not symmetrical.
Solution 1.4
The null and the alternative hypotheses are:
H1: Distribution is skewed
H0: Distribution is symmetric
Because the skewness for the normal distribution is zero, the hypotheses
can be written as:
H0 : Sk = 0
H1 : Sk 0
Sk 0 0.4715205
=
= 1.10454
Sk 0.426892
The p value for this test statistic is 0.1357, which is not low enough to
reject the null hypothesis that the distribution is normal. Confirm the p
value using Stata.
Software Use
The test statistic for testing skewness in Stata is the command sktest followed by the variable name. Stata uses Equation (1.2) to calculate skewness. Using the correction factor in Equation (1.3) generates a slightly
different result when the sample size is small. Furthermore, the Stata
command summarize calculates both skewness and kurtosis in addition
to a host of other descriptive statistics. It is necessary to use the option
detail with the command summarize. There does not appear to be a
command in Stata to obtain the standard deviation for skewness.
11
Example 1.5 Test the hypothesis that the distribution of the closing
prices of Microsoft stock is not symmetrical using Stata.
Solution 1.5
The Stata command and output that provides the p value corresponding
to the Z statistic for skewness is depicted in Figure 1.5. Make sure to
issue the command summarize, detail first.
Figure 1.5 Stata command and output for the sktest test of skewness
The corresponding p value for the test statistic for a one-tailed test is
0.2298. The p value is too high, and the null hypothesis of normality
from the perspective of symmetry should not be rejected. The output
also includes a joint chi-squared test for skewness and kurtosis, which is
preferred to the individual tests of skewness and kurtosis unless the specific statistic is of interest, as will be discussed in Chapter 1 of Volume II
when addressing Bonferroni correction. The p value for the joint statistic
is also too high to reject the null hypothesis of normality. Thus, there is
a possibility that the closing prices of the Microsoft stock are close
enough to values from a normal distribution.
The Stata software recommends using the ShapiroWilk (swilk) test
of normality first. Only if the null hypothesis of normality is rejected,
use sktest to determine whether skewness, kurtosis, or both are the
source of deviation from normality.
Test of Kurtosis
Departure from normality could also be due to flatness or pointedness,
which is called kurtosis. A positive kurtosis (leptokurtic) is more pointed
than the normal distribution, whereas a negative kurtosis (platykurtic) is
flatter than the normal distribution. Negative and positive kurtoses are
depicted in Figures 1.6 and 1.7, respectively. When the data has a normal distribution, it is called mesokurtic.
12
The technical definition of kurtosis is based on the concept of moments in statistics. Translating the concept into terms that most readers
are familiar with provides Equation (1.10).
( ( X ) )
K =
( ( X ) )
4
2 2
(1.10)
where K is the measure of kurtosis, and the remaining symbols are the
same as previously defined. It can be shown that Equation (1.10) is
equivalent to Equation (1.11).
K =
m4
m22
(1.11)
where m4 is the fourth moment and m2 is the second moment about the
mean. For the normal distribution, this value approaches 3 as the sample size increases. This is more evident in the following formulation of
kurtosis (Equation [1.12]), which is more useful when the sample size is
small, because it is corrected for the sample size.
13
3(n 1)2
X
n(n + 1)
K=
(1.12)
(n 1)(n 2)(n 3) (n 2)(n 3)
( X ) ) 30(54.8083) 1,644.249
(
K =
=
=
= 2.61937
(25.05448)
627.72
)
(
X
)
(
4
2 2
K =
24n(n 1)2
(n 2)(n 3)(n + 3)(n + 5)
(1.13)
where
K is the estimated standard error of kurtosis because kurtosis is
a statistic, as explained in Definition 1.2. It is evident that only the
sample size affects the standard deviation of kurtosis.
Example 1.7 Calculate the standard deviation of kurtosis for the closing
prices for Microsoft stock from May 21 to July 2, 2015.
Solution 1.7
The formula for the standard deviation of kurtosis (Equation 1.13) involves only the sample size.
14
K =
24n(n 1)2
(n 2)(n 3)(n + 3)(n + 5)
24 30 (30 1)2
(30 2) (30 3) (30 + 3) (30 + 5)
24 30 841
605,520
=
= 0.69347 = 0.8327
28 27 33 35
873,180
Software Use
Figure 1.8 Stata command and output for obtaining skewness and
kurtosis
15
Solution 1.8
Hypothesis The actual value of the null hypothesis depends on whether
Equation (1.10) or (1.12) is used. In the case of the former, the null
hypothesis is equal to 3 and in the case of the latter, it is 0.
H0 : K = 3
H0 : K = 0
or
H1 : K < 0
H1 : K 0
Test Statistic
K 0
(1.14)
where Z Sk is the test statistic for testing normality using kurtosis. The
value 0 in the numerator reflects the fact that the kurtosis for the normal
distribution is 0, which is the null hypothesis. When Equation (1.10) is
used instead of the Equation (1.12), the expected value of the kurtosis is
3 and not 0; therefore, Equation (1.14) should be adjusted accordingly.
Reject the null hypothesis when the p value is small enough.
Example 1.9 Test the hypothesis that the distribution for the closing
prices of Microsoft stock is either flat or pointed compared to a normal
distribution.
16
Solution 1.9
The null and alternative hypotheses are:
H0: Distribution is normal
When using Equation (1.10), the value of kurtosis for a normal distribution is 3; so the hypotheses can be written as
H0 : K = 3
or
H1 : K 0
K 3 2.619369 3
=
= 0.4571
0.8327
K
The corresponding p value for the test statistic for a one-tailed test is
0.323778. However, as the stated alternative hypothesis is two-tailed, it
is necessary to double the p value and obtain 0.6476. The p value is not
small enough to reject the null hypothesis of normality. Verify the accuracy of the result using Stata.
Software Use
17
Solution 1.10
The Stata command and output that provide the p value corresponding
to the Z statistic for a test of kurtosis is depicted in Figure 1.9. Make
sure to issue the command summarize, detail first.
Figure 1.9 Stata command and output for the sktest test of kurtosis
The p value for kurtosis is too large to reject the null hypothesis.
Furthermore, the statistic for neither the skewness nor the kurtosis is
significant. The output also includes a joint chi-squared test for skewness and kurtosis, which is preferred to the individual tests unless the
specific statistic is of interest, as will be discussed in Chapter 1 of Volume II when addressing Bonferroni correction. The p value for the joint
statistic is also too high to reject the null hypothesis of normality. Thus,
there is a possibility that the closing prices of the Microsoft stock are
close enough to the values from a normal distribution.
The Stata software recommends using the ShapiroWilk (swilk) test
of normality first. If the null hypothesis of normality is rejected, use
sktest to determine whether skewness, kurtosis, or both are the source of
deviation from normality. The ShapiroWilk test belongs to the Kolmogorov type goodness of fit tests.
Testing Normality
When parametric inference is based on the requirement of normality, it
is necessary to test the data for normality. Another common application
of a normality test is in regression analysis, where the estimated residuals
are tested to verify conformity with normality. Yet there is no reliable
parametric test for normality. Graphical comparisons of a cumulative
distribution of observations versus a cumulative normal distribution are
informative but imprecise.
When the graph of data is skewed to the right or left and when it is
flat or pointed as compared to the normal distribution, the data is not
18
19
20
the sample data are obtained. However, these parameters are usually
unknown. Lilliefors provides an alternative where the sample estimates
(i.e., statistics) are used instead of the parameters. Because this method is
more pragmatic, it will be used here. The modifications are more in the
testing of the hypothesis than in the computation of the test statistic.
Sample values are standardized using the sample mean and variance,
which are compared to the Z scores of the standard normal probability
distribution function. The test statistic consists of the maximum vertical
distance between the two probability distribution functions, as is the
case in the original Kolmogorov test.
To calculate the cumulative frequency, first create a frequency distribution for the sample data; make sure that the possible outcomes that
do not actually occur receive a frequency value of zero (0). There are
three choices for the next step. You may use the median, mean, or the
midpoint of the data (Equation [1.15]) when calculating the Z score,
because the mean and the median are identical for the normal distribution and the midpoint of the data is close to them, if not identical. Here
we use the median. Then calculate the standard deviation of the data;
you can either use the raw data using the conventional formula or use
the tabulated data with frequencies using Equation (1.16).3 Finally, calculate the Z scores (Equation [1.17]).
M = (Xmin + Xmax)/2
( fX )2
( fX )2
n 1
(1.15)
(1.16)
where f represents the observed frequencies and the Xs are the actual
observations. The next step is to obtain the Z score as in Equation
(1.17) based on the median instead of the mean and the standard deviation of the observed values (Equation [1.16]).
Z =
X M
(1.17)
21
The test statistic is the largest absolute value of the difference between
the samples observed cumulative frequency distribution function and
the cumulative frequencies of the normal distribution function. It is
customary to show the cumulative frequencies of the observed values of
the sample by S X i . Similar to the cumulative frequencies of the observed
values of the sample, obtain the cumulative probability values corresponding to the Z scores from the standard normal distribution, which
is often shown as X i , which is pronounced phi. Find the absolute values of differences between X i and observed frequencies S X i . Customar , pronounced
ily, two such distances are obtainedthe one labeled as D
D tilde, in Equation (1.18) is the conventional formulation. An alternative formulation based on the lagged values of the sample cumulative
frequencies is labeled D and is shown in Equation (1.19).
=| S |
D
Xi
Xi
(1.18)
22
D = | X i S X i 1 |
(1.19)
The formula in Equations (1.18 and 1.19) can be applied to two population comparisons; as such, they are known as the Kolmogorov
Smirnov normality test, which will be discussed shortly. The largest val is compared to a tabulated value4 to test the null hypothesis of
ue of D
normality.
Example 1.11 Use the Lilliefors method to determine if the distribution
function of the closing prices for Microsoft stock is the same as that of a
normal distribution. Use sample data from May 21, 2015 to July 2,
2015.
Solution 1.11
Calculate the median, M, and the standard deviation (Equation [1.16])
needed for calculating the Z score. Table 1.2 depicts the computational
detail. Use the original data (not the data with frequencies) for the calculations.
Although the standard deviation can be obtained using the original
data, we use the formulation using the frequencies for practice. Because
the standard deviation can be obtained from the original data using
Stata and Excel, it is easy to verify the results. In order to calculate the
standard deviation, first find
( X )
the frequency, squaring them, and then summing the results. You
should get 63,771.21. Then find
( X )
the frequency and the stock price, then squaring the sum. You should
get 1,382.89. The reason for multiplying the values by their frequencies
is that we are using the tabulated data instead of the raw data, for practice. Plug these numbers in to the following equation for the standard
deviation.
Pearson, E.S. and H.O. Hartley. 1976. Biometrika Tables for Statisticians.
Cambridge University Press: Cambridge.
( X )2
23
( X )2
n 1
(1,382.89)2
1,912,384.77
63,771.21
30
30
=
30 1
29
63,771.21
63,771.21 63,746.16
= 0.8638 = 0.9294
29
These calculations do use the formula for frequencies. Use the original data (not the data with frequencies) in Stata with the summarize
command to verify the results. The results from Stata are slightly different because of rounding off.
sum msft
Table 1.2 displays the values of the Z scores for the Microsoft stock,
their corresponding probability, the empirical frequencies, and the observed frequencies necessary to calculate the absolute distances D and D.
The second column in Table 1.2 displays the Microsoft data sorted
from smallest to largest, while the third column provides the corresponding frequencies under the heading f. The column titled phi contains the probabilities of the tail areas for the calculated Z scores from a
normal distribution. The column Sx is the cumulative frequencies of the
observed values. The column D tilde is calculated using Equation (1.18),
and D is calculated using Equation (1.19).
The detailed computation for the stock price of $45.65, which occurred on June 9 and June 25, 2015, is displayed below for reference.
Z =
X M
45.65 46.12
=
= 0.5054
0.93
1
1
44.37
44.4
44.45
45.26
45.48
45.64
45.65
45.73
45.83
45.91
45.97
46.1
46.14
46.23
46.36
6/29/2015
7/2/2015
7/1/2015
6/26/2015
6/15/2015
6/24/2015
6/9/2015
6/8/2015
6/16/2015
6/23/2015
6/12/2015
6/19/2015
6/5/2015
6/22/2015
6/4/2015
44.15
6/30/2015
MSFT
Date
0.0333
0.0333
0.0333
0.0333
0.0667
0.0333
0.0333
0.0333
0.0667
0.0333
0.0333
0.0333
0.0333
0.0333
0.0333
0.0333
Rel f
0.2582
0.1183
0.0215
0.0215
0.1614
0.2259
0.3120
0.4196
0.5057
0.5164
0.6885
0.9252
1.7967
1.8505
1.8828
2.1194
Z score
0.6019
0.5471
0.5086
0.4914
0.4359
0.4106
0.3775
0.3374
0.3066
0.3028
0.2456
0.1774
0.0362
0.0321
0.0299
0.0170
phi
0.6000
0.5667
0.5333
0.5000
0.4667
0.4000
0.3667
0.3333
0.3000
0.2333
0.2000
0.1667
0.1333
0.1000
0.0667
0.0333
Sx
0.0019
0.0196
0.0247
0.0086
0.0308
0.0106
0.0109
0.0041
0.0066
0.0694
0.0456
0.0108
0.0971
0.0679
0.0368
0.0163
D tilde
0.0352
0.0138
0.0086
0.0247
0.0359
0.0440
0.0442
0.0374
0.0732
0.1028
0.0789
0.0441
0.0638
0.0345
0.0035
24
A PRIMER ON NONPARAMETRIC ANALYSIS, VOLUME I
46.44
46.59
46.61
46.72
46.85
46.86
46.9
46.92
47.23
47.42
47.45
47.61
6/11/2015
5/26/2015
6/10/2015
6/18/2015
6/3/2015
5/29/2015
5/22/2015
6/2/2015
6/1/2015
5/21/2015
5/28/2015
5/27/2015
0.0333
0.0333
0.0333
0.0333
0.0333
0.0333
0.0333
0.0333
0.0333
0.0333
0.0333
0.0333
1.6030
1.4309
1.3986
1.1942
0.8607
0.8392
0.7961
0.7854
0.6455
0.5272
0.5057
0.3443
0.9455
0.9238
0.9190
0.8838
0.8053
0.7993
0.7870
0.7839
0.7407
0.7010
0.6935
0.6347
1.0000
0.9667
0.9333
0.9000
0.8667
0.8333
0.8000
0.7667
0.7333
0.7000
0.6667
0.6333
0.0545
0.0429
0.0143
0.0162
0.0614
0.0340
0.0130
0.0172
0.0074
0.0010
0.0268
0.0013
0.0211
0.0096
0.0190
0.0171
0.0280
0.0007
0.0204
0.0506
0.0407
0.0343
0.0601
0.0347
26
There are 30 observations in the sample and if they were equally likely,
they would have had a probability of 1/30 of occurring. Because $45.65 and
$45.97 occur twice, their relative frequency is 2/30 = 0.06667. The cumulative probability for the value 45.97 is shown in column S X .
Cumulative observed frequency (f ) 45.97 = 0.0333 + 0.0333 + 0.0333
+ 0.0333 + 0.0333 + 0.0333 + 0.0333 + 0.0333 + 0.0333 + 0.0333 +
0.0333 + 0.0333 + 0.0667 = 0.4667
The Z score associated with 45.97 is 0.1614, and the corresponding
X i is 0.4359.
is
Therefore D
The entire row corresponding to the earlier calculations and the cell
. value is shaded gray for visual effect. The rest of
with the maximum D
the values are calculated in the same way. The largest value of the distance between the theoretical and observed cumulative probability distribution function is 0.0971 belonging to July 1, 2015 when the closing
price of the stock was $45.45 (the value is shaded in Table 1.2). The
tabulated value of Lilliefors for n = 30 at = 0.05 is 0.159.5 Because the
Mason, A.L. and C.B. Bell. 1986. New Lilliefors and Srinivasan Tables
with Applications. Communications in StatisticsSimulation 15: 451467.
27
=| S |
D
Xi
Xi
(1.18)
D = | X i S X i 1 |
(1.19)
Test Statistic
(1.20)
The critical values for this Z statistic are provided by Smirnov and depicted in Table 1.3.7
Smirnov, N.V. 1948. Table for Estimating the Goodness of Fit of Empirical Distributions. Annals of Mathematical Statistics 19: 271281.
7
Smirnov, Table for Estimating the Goodness of Fit of Empirical Distributions.
28
p =1
If 0.27 Z < 1
p = 1
If 1 Z < 3.1
p = 2 (R R 4 + R 9 R 16 )
If Z 3.1
p=0
2.506628
(Q + Q9 + Q25)
Z
1.233701Z
where Q = e
where R = e2Z
Example 1.12 Test the normality of the closing prices for Microsoft
stock between May 21 and July 2, 2015 using the Kolmogorov
Smirnov test.
Solution 1.12
All the necessary preliminary calculations are displayed in Table 1.2.
The maximum value of D corresponds to a stock price of $45.64, which
occurred on June 24, 2015. The computation for the D statistic corresponding to this observation is shown.
D = | X i S X i 1 | = 0.3028 0.2000 = 0.1028
,
Calculate the test statistic Z, using the maximum value of D or D
which are shaded in gray in Table 1.2. In this case, D is greater than D
(0.1028 > 0.0971); so we use D in Equation (1.20).
2.506628
(Q + Q 9 + Q 25 )
Z
29
2.506628
(0.020288 + 0.0202889 + 0.02028825 )
0.5626
= 1 0.0904 = 0.9096
Because the p value is not small enough, fail to reject the null hypothesis
of normality.
Software Use
30
The detailed information about the D statistic and the corresponding p values indicate that the hypothesis of a normal distribution cannot
be rejected. The KolmogorovSmirnov test also fails to refute that the
data has a normal distribution. Had the normality tests resulted in rejecting the null hypothesis of a normal distribution, it would have been
necessary to conduct a test to determine whether skewness or kurtosis is
the reason for the deviation from normality. Refer to Examples 1.5 and
1.10 for details on how to test for skewness and kurtosis in Stata.
Large Sample Approximation
When the sample size is sufficiently large, the large sample approximation can be used. The large sample approximation also affects the critical
values without necessitating an alternative or extra computation. Miller
provides the table of critical values for the KolmogorovSmirnov test.8
Approximate critical values for samples greater than 40 are provided for
some of the commonly used levels of significance:
0.01
0.05
0.10
0.15
1.63
n
1.36
n
1.22
n
1.14
n
31
0.05
0.10
0.20
1.035
n
0.895
n
0.819
n
0.741
n
Durbin, J. 1975. Kolmogorov-Smirnov Tests When Parameters are Estimated with Applications to Tests of Exponentiality and Tests on Spacings. Biometrika 62(1): 522.
10
Shapiro, S.S. and M.B. Wilk. 1965. An Analysis of Variance Test for
Normality (Complete Samples) Biometrika 52(3/4): 591611. Tables on
pages 603604.
32
Test Statistic
The ShapiroWilk test (Equation [1.21]) calculates the correlation between the ordered expected values of the standard normal distribution
and the ordered values of the normalized observed data. The correlation
coefficient for values from the same distribution function is 1.
W =
1 K
ai ( X (n i +1) X (i ) )
D i =1
(1.21)
11
33
Xi
X (ni+1)
X (ni+1) X (i)
ai
a i (X (ni+1) X (i))
44.15
47.61
3.46
0.4254
1.471884
44.37
47.45
3.08
0.2944
0.906753
44.40
47.42
3.02
0.2487
0.751073
44.45
47.23
2.78
0.2148
0.597144
45.26
46.92
1.66
0.1870
0.31042
45.48
46.90
1.42
0.1630
0.23146
45.64
46.86
1.22
0.1415
0.17263
45.65
46.85
1.20
0.1219
0.14628
45.65
46.72
1.07
0.1036
0.110852
10
45.73
46.61
0.88
0.0862
0.075856
11
45.83
46.59
0.76
0.0697
0.052972
12
45.91
46.44
0.53
0.0537
0.028461
13
45.97
46.36
0.39
0.0381
0.014859
14
45.97
46.23
0.26
0.0227
0.005902
15
46.10
46.14
0.04
0.0076
0.000304
4.876849
( n i +1)
X (i ) ) = ( 04.254 )( 47.61 44.15 ) + ( 0.2944 )
ai ( X
i =1
47.45
44.37
(
) + ... + ( 0.0076 )( 46.14 46.10 ) = 4.87689
The outcome matches the amount shown at the bottom of the last
column in the Table 1.4. Next, calculate D and insert it into the equation for W along with the number calculated in the previous step. D is
calculated by subtracting the mean (46.10) from each observation and
then adding all the squared results. Note that D here is different from
the D used in KolmogorovSmirnov test.
Then plug the results from D and 4.87689 (from the earlier calculation)
into the equation for W.
34
1 K
W = ai ( X (n i +1) X (i )
D i =1
(4.87689)2 23.78405
=
= 0.949293
25.05448 25.05448
Figure 1.12 Stata commands and outputs for the ShapiroWilk and
ShapiroFrancia tests
13
35
Solution 1.15
In neither test, the p value is small enough to reject the null hypothesis
that the data were obtained from a population with a normal distribution function. The advantage of using the software is the ability to
obtain the exact p value. The ShapiroWilk and ShapiroFrancia tests
belong to a family of tests known as goodness of fit tests developed by
Kolmogorov.14
14
Year
QBeer
1995
1996
1997
1998
22.47
22.51
22.37
22.52
PBeer
0.81
0.84
0.84
0.86
IMean
44,938
47,123
49,692
51,855
PWine
4.57
4.93
5.17
5.07
QWine
1.70
1.77
1.79
1.81
IMedian
QLiqour
34,076
1.35
35,492
1.35
37,005
1.34
38,885
1.33
(Continued)
36
QBeer
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
22.74
22.72
22.87
22.98
22.79
22.93
22.70
23.00
23.07
22.98
22.34
22.00
PBeer
IMean
0.88
0.92
0.96
0.99
1.01
1.07
1.09
1.11
1.12
1.16
1.21
1.23
54,737
57,135
58,208
57,852
59,067
60,466
63,344
66,570
67,609
68,424
67,976
67,530
PWine
5.24
5.41
5.96
6.23
6.39
6.92
7.77
7.90
8.55
9.45
10.07
9.65
QWine
1.85
1.90
1.88
2.03
2.08
2.15
2.17
2.22
2.27
2.27
2.27
2.30
IMedian
40,696
41,990
42,228
42,409
43,318
44,334
46,326
48,201
50,233
50,303
49,777
49,445
QLiqour
1.35
1.38
1.38
1.40
1.44
1.48
1.52
1.57
1.59
1.63
1.64
1.66
Source: Beer Institute, Brewers Almanac 2010: Per Capita Consumption of Beer by State
19942010. US Census Bureau, Table H-6: RegionsAll Races by Median and Mean Income:
19752010. Bureau of Labor Statistics, Consumer Price IndexAverage Price Data: Malt
beverages, all types, all sizes, any origin, per 16 oz.
Solution 1.16
The results are shown in Figure 1.13 depicting the Stata output. After
running the regression, the residual is stored in the variable errorhat for
use in the ShapiroWilk test of randomness. The Stata commands used
to conduct the swilk and sfrancia tests following a regression are as
follows:
regress depvar indepvars
predict errorhat, residual
swilk errorhat
sfrancia errorhat
where depvar is the dependent variable and indepvars are the independent variables.
The p values for the ShapiroWilk and ShapiroFrancia statistics for
the test of normality are not small enough; therefore, we fail to reject the
null hypothesis of normality. The estimated residuals have a normal
distribution function.
37
Figure 1.13 Stata output of regression of QBeer on PBeer and IMean, and
the runs test of randomness
Index
Asymptotic relative efficiency (ARE)
chi-squared test, 76
definition of, xviii
Fishers Exact test, 115
KolmogorovSmirnov two-sample
test, 107
MannWhitney U Test, 86
median sign test, 65
Wilcoxon signed-rank test, 73
Binomial distribution function, 59,
6364, 93
Binomial test, 5758
Bonferroni correction, 11
Categorical data, definition of, xvii
Central Limit Theorem (CLT), xv
Chi-squared distribution function,
105, 106
Chi-squared test, 7376, 88
asymptotic relative efficiency, 76
hypothesis, 7475
test statistic, 7576
Cochran Q test, 91
Consistency, definition of, xxiv
Continuity correction, 4856
Correction factor, 5
Cumulative distribution function,
1921, 98100
Cumulative frequency, 99101
Dichotomous data, definition of, xvii
Dichotomous measurement scale,
107
Distribution-free methods, xi
Effect size
MannWhitney U Test, 86
Wilcoxon signed-rank test, 7273
Efficiency, definition of, xxiv
118
INDEX
KolmogorovSmirnov (Continue)
software use, 105106
statistic, 98105
Kolmogorov test, 1827
cumulative distribution functions,
1921
Lilliefors normality test, 19
test statistic, 2127
Ksmirnov, 105
Kurtosis test, 1117
definition of, xx, 12
software use, 1415, 1617
standard deviation of, 1314
statistic, 1516
Large sample approximation
Fishers Exact test, 113115
KolmogorovSmirnov test
one-Sample, 3031
two-sample test, 106107
MannWhitney U Test, 8486
median sign test, 64
randomness tests, 4950
WaldWolfowitz runs test, 48
Wilcoxon signed-rank test, 7072
Leptokurtic, xx, 11
Likert scale, definition of, xvii
Lilliefors normality test, 19
Location inference, one-sample
chi-squared test, 7376
asymptotic relative efficiency, 76
hypothesis, 7475
test statistic, 7576
introduction, 57
median sign test, 5765
asymptotic relative efficiency, 65
hypothesis, 58
large sample approximation, 64
software use, 6264
statistic, 5962
Wilcoxon signed-rank test, 6573
asymptotic relative efficiency, 73
effect size, 7273
hypothesis, 66
INDEX
119
120
INDEX
a one-time purchase,
that is owned forever,
allows for simultaneous readers,
has no restrictions on printing, and
can be downloaded as PDFs from within the library community.
Our digital library collections are a great solution to beat the rising cost of textbooks. E-books
can be loaded into their course management systems or onto students e-book readers.
The Business Expert Press digital libraries are very affordable, with no obligation to buy in
future years. For more information, please visit www.businessexpertpress.com/librarians. To
set up a trial in the United States, please email sales@businessexpertpress.com.