Runs Test - Stat Notes, From North Carolina State University, Public Administration Program

Runs Test of Randomness
Overview Contents
The onesample runs test of significance is commonly used as anonparametric test of randomness in
a sample. Note that this is a necessary but not sufficient test for random sampling. A nonrandom Key concepts and terms
availability sample of, say, students in a class, may be a very biased representation of all students in
a university, yet within the class the order of sampling may be random and the sample may pass the Assumptions
runs test. On the other hand, if a purportedly random sample fails the runs test, this indicates that
there are unusual, nonrandom periodicities in the order of the sample inconsistent with random Frequently asked questions
sampling.
Bibliography
Key Concepts and Terms
¡ Runs. A "run" is a series of similar responses for a given test variable. For instance, if R=Republican and D=Democrat, the first 15 responses in a random
sample might be RRDDDDRDDRRRDRD. Each series of like responses (each series of R's or D's in this example, even a "series" with only 1 R or D) is a
"run." In this example there are eight runs, starting with the RR run and ending with a D run. While the runs test is typically performed on a binary test variable
of this sort, the runs test has been adapted for variables with more than two values. In SPSS, multiple values are allowed but they must be numeric.
¡ Runs test. Using the laws of probability, it is possible to estimate the number of runs that one would expect by chance, given the proportion of the
population in each of the two categories and given the sample size (ex., given 7 R's and 8 D's in a sample of 15). Let r = the observed number of runs.
Comparing observed r with the critical value of r in a table of the runs test, the researcher can tell if r is outside the range which would occur by chance. For
this example, observed r would have to be 4 or less or 13 or more to conclude that the number of runs was outside the range expected under conditions of
randomness. Since observed r was 8 in this example, the researcher concludes that the sample cannot be said to be nonrandom. By default, significance
tests are twotailed and asymptotic. For small samples with a binary test variable, exact significance tests are available. For large samples, Monte Carlo
estimated significance levels are available. However estimated, a finding of nonsignificance upholds the assumption of random order.
¡ Runs test for serial randomness. This is a variant of the runs test for quantitative variables. In essence, a new updown series is created (hence the
alternative name updown runs test) with symbols (ex., U, D) indicating if the given value in is up or down in value from the prior value in sequence. The runs
test is then performed on the updown series rather than upon the original quantitative variable. See Sheskin, 2007: 395397.
¡ Cut points. The runs test algorithm divides the sample at a cut point. The SPSS dialog, shown below, allows the researcher to specify the mean, median, or
mode, and/or to specify a custom cut point.
¡ Example. The table below is SPSS output, reformatted for better display, for a dataset on smoking, using the test variable Smoking, which varied from 1 to
6. Tests were requested for the mean, median, and mode as alternative cutting points, hence the three runs tables below.
For these data, the median and modal cutting points were quite different, but all three versions of the runs test returned a smilar Z value, which was highly
significant in all three cases. The significance coefficient is twotailed, meaning that it is testing if there are too few or too many runs compared to expected
runs under random conditions. A finding of significance, as here, means that the researcher concludes the series does differ significantly from random. A
negative Z value means there are fewer runs than would be expected. The assumption of randomness would have been upheld by a finding of nonsignificance.
Here, the data on smoking do not appear to have been entered into the dataset in random order. Not this does not mean the data were not collected by a
random sample.
For these data, the median and modal cutting points were quite different, but all three versions of the runs test returned a smilar Z value, which was highly
significant in all three cases. The significance coefficient is twotailed, meaning that it is testing if there are too few or too many runs compared to expected
runs under random conditions. A finding of significance, as here, means that the researcher concludes the series does differ significantly from random. A
negative Z value means there are fewer runs than would be expected. The assumption of randomness would have been upheld by a finding of nonsignificance.
Here, the data on smoking do not appear to have been entered into the dataset in random order. Not this does not mean the data were not collected by a
random sample.
¡ Type of significance estimate. The Exact button in the SPSS dialog above allows the researcher to select among asymptotic, exact, or MonteCarlo
estimates of the significance of the runs test test value. These three types of estimates are discussed separately in the section on significance testing. This
requires that the SPSS Exact Tests addon module be installed.
Assumptions
¡ Data order. If used as a test of randomness in random sampling, it is assumed data are entered into the dataset in the order sampled (that is, without any
grouping or other preprocessing).
¡ Numeric data. In the SPSS implementation of the runs test, the test variables must be numeric. Thus it may be necessary to recode string variables.
¡ Data level. The original runs test required that data be a dichotomy which is mutually exclusive and exhaustive for all cases. However, the runs test does not
require true dichotomies: the researcher may dichotomize continuous variables around the mean, median, mode, or some custom cut point. Normally one
would use the median for ordinal data and the mean for interval data. However, multinomial variables lacking ordinality cannot be dichotomized, of course,
since "above" and "below" the cut point has no nonarbitrary meaning. Variants on the runs test have been created to allow for two or more cutting points
rather than a single one. SPSS supports test variables which are binary, ordinal, or interval, using a single cut point, which may be mean, median, mode,
and/or a researchersupplied custom value.
¡ Data distribution. The runs test is a nonparametric test, not assuming the normal or any other particular distribution.
Frequently Asked Questions
¡ Where in SPSS does one find the runs test?
From the SPSS menu, select Analyze, Nonparametric Tests, Runs. The Exact Tests module must be installed. In the Runs Test dialog box enter the
variable list to be tested for randomness, then enter the cut points to be used to dichotomize variables (mean, median, mode, or custom).
SPSS statistical output includes the mean, standard deviation, minimum, maximum, and number of nonmissing cases; and quartile values ( values at the
25th, 50th, and 75th percentiles).
¡ What is the lengthofruns test?
The runs test described above is the numberofruns test. There is a variant, which centers on the average length of runs rather than their number. Both
versions are used to test for sample randomness. However, there is some evidence that the conventional numberofruns test is more powerful than the
lengthofruns test, which therefore is little used. The lengthofruns test is discussed in Bradley (1968: 25559).
¡ What is the SPSS syntax for the runs test?
This is part of the general nonparametric tests (NONPAR) syntax:

NPAR TESTS [CHISQUARE=varlist[(lo,hi)]/] [/EXPECTED={EQUAL }]
{f1,f2,...fn}
[/KS({UNIFORM [min,max] })=varlist]
{NORMAL [mean,stddev]}
{POISSON [mean] }
{EXPONENTIAL [mean] }
[/RUNS({MEAN })=varlist]
{MEDIAN}
{MODE }
{value }
[/BINOMIAL[({.5})]=varlist[({value1,value2})]]
{ p} {value }
[/MCNEMAR=varlist [WITH varlist [(PAIRED)]]]
[/SIGN=varlist [WITH varlist [(PAIRED)]]]
[/WILCOXON=varlist [WITH varlist [(PAIRED)]]]
{MEDIAN}
{MODE }
{value }
[/BINOMIAL[({.5})]=varlist[({value1,value2})]]
{ p} {value }
[/MCNEMAR=varlist [WITH varlist [(PAIRED)]]]
[/SIGN=varlist [WITH varlist [(PAIRED)]]]
[/WILCOXON=varlist [WITH varlist [(PAIRED)]]]
|/MH=varlist [WITH varlist [(PAIRED)]]] ††
[/COCHRAN=varlist]
[/FRIEDMAN=varlist]
[/KENDALL=varlist]
[/MW=varlist BY var (value1,value2)]
[/KS=varlist BY var (value1,value2)]
[/WW=varlist BY var (value1,value2)]
[/MOSES[(n)]=varlist BY var (value1,value2)]
[/KW=varlist BY var (value1,value2)]
[/JT=varlist BY var (value1, value2)]††
[/MEDIAN[(value)]=varlist BY var (value1,value2)]
[/MISSING=[{ANALYSIS**}] [INCLUDE]]
{LISTWISE }
[/SAMPLE]
[/STATISTICS=[DESCRIPTIVES] [QUARTILES] [ALL]]
[/METHOD={MC [CIN({99.0 })] [SAMPLES({10000})] }]††
{value} {value}
{EXACT [TIMER({5 })] }
{value}
**Default if the subcommand is omitted.
††Available only if the Exact Tests option is installed.
Bibliography
¡ Bradley, James (1968). Distributionfree statistical tests. Englewood Cliffs, NJ: PrenticeHall.
¡ Sheskin, David (2007). Handbook of parametric and nonparametric statistical procedures, 4th ed.. Boca Raton, FL: Chapman & Hall. CRC.
@c 2006, 2008 G. David Garson
Last updated: 4/24/2008.
Back

Runs Test - Stat Notes, From North Carolina State University, Public Administration Program

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Runs Test - Stat Notes, From North Carolina State University, Public Administration Program

Hochgeladen von

Copyright:

Verfügbare Formate

Das könnte Ihnen auch gefallen