Introduction To Non-Parametric Statistics: Learning Objectives

06/08/2019
Introduction to
Non-Parametric Statistics
Kim Carmela D. Co
Email: kimcarmelaco@up.edu.ph
Learning Objectives
At the end of the session, the participants should be able to:
1. Discuss the process of hypothesis testing and statistical
significance and its required assumptions
2. Describe and differentiate distribution free tests from strictly
non-parametric tests
3. Discuss the advantages and disadvantages of non-parametric
statistical methods
4. Discuss scenarios where non-parametric methods are useful
5. Discuss criteria in selecting statistical tests
1
06/08/2019
Branches of Statistics
Descriptive Inferential
• Calculation or presentation of • Making evaluations or

distinguishable and well-defined probability statements
characteristics of the data concerning the accuracy of an
• No assumptions made or implied estimate or reliability of a
• Description of the sample may decision
represent the population if it is a • Use of sample statistics to infer
random sample some information about the
population, usually an unknown
population parameter
Hypothesis Testing
1. Statement of statistical hypotheses
(Statements about the population)
• Null hypothesis is a statement of no effect or no difference
• Alternative hypothesis indicates presence of an effect or
difference, can be directional or nondirectional
2. Evaluation of data through the use of an inferential

statistical test, which yields a test statistic
• Comparison of test statistic with the critical region based on the alpha
and the assumed distribution
• Test may be one-tailed or two-tailed
2
06/08/2019
Hypothesis Testing
3. Assess the statistical significance
Decision on whether obtained difference is due to presence of
genuine effect or may be attributable to chance is based on the
computation of the p-value:
• Probability of obtaining a result as extreme or more extreme than the
sample result, given that the null hypothesis is true
• Based on assumptions regarding the underlying population

distribution (usually its form and some parameter values)
3
06/08/2019
How valid is the p-value?

Type I Error
• Rejecting a true null hypothesis
• Level of Significance (α) – maximum probability of rejecting a
true null hypothesis, assuming Ho (and the other assumptions) is
true
• Also known as the size of the critical region

Type II Error
• Not rejecting a false null hypothesis
• Power (1 – β) – probability of rejecting a false null hypothesis
https://www.probabilitycourse.com/chapter8/8_4_2_general_setting_definitions.php
4
06/08/2019

• If assumptions about the underlying distribution are correct, the
p-value reflects the exact probability of getting the observed
sample data
• More often, p-value is calculated using an approximation to the

true distribution (here, it is approximate or asymptotic, rather
than exact)
Hypothesis Testing
• Conclusions using these techniques are only valid if assumptions
can be substantiated
• For most statistical tests (z-test, t-test, F-test):
• Randomly drawn from a normally distributed population
• Consist of independent observations, except for paired values
• Consist of values on an interval or ratio measurement scale
• Have populations with approximately equal variances
• Most of the time, normality is assumed based on:

• Histogram or box plot of the sample data
• Previous literature about the variables
5
06/08/2019
• Can normality be assumed?
• Generally, normality is
difficult to assume for
asymmetric histograms,
especially for
small sample sizes
• Could lead to misleading results

if parametric statistics are used
Alternatives?
If normality cannot be assumed, then the data should at least:
• Approximately resemble a normal distribution
• Have an adequately large sample size
• Invocation of the Central Limit Theorem
“Sampling distribution of the mean becomes approximately normal
regardless of the distribution of the original variable”
• Otherwise, perform methods to adhere to these assumptions

• Increase sample size, get ratio level data, transform data
6
06/08/2019
Distribution-Free Tests
• Methods based on functions of the sample observations whose
sampling distribution can be determined
without knowledge of the specific distribution of the
underlying population
• No strict assumptions regarding the underlying population
distribution or level of measurement
• Analysis methods can be applied to samples from populations
having distributions which need not belong to a specific
distribution family (particularly the normal one)
• Usually, exact p-values can be computed
Branches of Statistics
Descriptive Inferential
• Calculation or presentation of • Making evaluations or

distinguishable and well-defined probability statements
characteristics of the data concerning the accuracy of an
• No assumptions made or implied estimate or reliability of a
• Description of the sample may decision
represent the population if it is a • Use of sample statistics to infer
random sample some information about the
population, usually an unknown
population parameter
7
06/08/2019
Parameters
Numerical characteristics of the population from which sample was
drawn
1. Measures of Location
2. Measures of Variability
3. Measures of Association between Variables
Parametric and Non-Parametric

Non-parametric
Parametric Statistics
Statistics
• Classical statistical • Test of hypothesis
techniques whose which is not a
objective is estimating statement about
or testing a hypothesis parameter values
about one or more
population parameters
8
06/08/2019
• Include procedures that test hypotheses which are not statements
about population parameters
• Hypothesis is concerned only with the form or shape of the
population distribution (e.g. goodness of fit tests) or with some
other characteristic of the probability distribution of the sample
data (test of randomness or trend)
• Does this apply to all non-parametric tests?
• Generally used to refer to both non-parametric and distribution-
free tests
• Non-parametric and parametric hypotheses are identical in the

case of continuous and symmetrical populations
• BUT non-parametric methods may also be used for data that
simply specify order or counts of numbers of events or for
hypotheses that are not about parameters.
9
06/08/2019
Advantages
• Tests of hypotheses which are not statements about parameter
values have no counterpart in parametric statistics
• Require few assumptions about the underlying populations from
which the data are obtained
• Conclusions reached in nonparametric methods do not require many
qualifiers to be considered valid
• In most cases, quick and easy to apply
• Tests often involve simple arithmetic, and easy to understand
• Can be used for data in any level of measurement
Advantages
• Scope of application is wider
• Can be used for instances where it is impractical or impossible to obtain
quantitative measurements
• Many nonparametric procedures require just the ranks of the
observations
• Whereas the parametric procedures require the magnitudes
• Can simply be a different approach to solving standard statistical
problems
• Relatively insensitive to outlying observations (uses median rather than mean)
• Enables the user to obtain exact P-values for tests, without relying on assumptions
that the underlying populations are normal
10
06/08/2019
Disadvantages
• Difficulty in construction of confidence intervals
• May overlook information that may have led to a better solution.
• For some tests, may require a larger sample size to achieve the same power as
parametric counterparts
• Most tests have a lower power than their parametric counterparts

• Combined with the fact that non-parametric tests are usually applied for datasets with
smaller sample sizes
Disadvantages
• May be considered by some as discarding information (conversion of
quantitative to ranks), since quantitative data is not required by
nonparametric methods
• BUT it may be argued that:
• If underlying distribution is known, and classical testing can be applied,
then there is no need to use nonparametric methods
• Usually, the nonparametric procedures are only slightly less efficient than
their normal theory competitors when the underlying populations are
normal
• Nonparametric methods can be mildly or wildly more efficient than these
competitors when the underlying populations are not normal
11
06/08/2019
Decision to use Non-Parametric Statistics

For small samples:
• Should be considered when there is any doubt about assumptions for parametric tests, since
they are frequently almost as powerful as parametric statistics
For large sample sizes:
• Either test may be more reliable, depending on the particular tests compared and type or
degree of deviations assumed
No generalizations on decision for moderate-sized samples
Usually, a selection of interchangeable methods is available

• Power comparisons are available among tests designed to detect location differences
• Comprehensive conclusions impossible for blanket comparisons of general tests

• When the outcome is an ordinal variable or a rank
12
06/08/2019

• When it is clear that the outcome does not follow a normal
distribution
• When there are definite outliers
• Example: Days in the hospital following a particular surgical procedure

• When the outcome has clear limits of detection
• When the outcome is a continuous variable that is measured with some
imprecision (e.g., with clear limits of detection)
Example:
HIV viral load - can range from "not detected" or "below the limit of
detection" to hundreds of millions of copies. Thus, in a sample some
participants may have measures like 1,254,000 or 874,050 copies and others
are measured as "not detected."
If a substantial number of participants have undetectable levels, the

distribution of viral load is not normally distributed
13
06/08/2019
Criteria to assess performance of a test
Power function of the test

• Probability that the test statistic will lead to a rejection of H0 if it is false
• A test is most powerful for a specified alternative hypothesis if no other
test of the same size has a greater power against the same alternative
• More powerful tests usually have much more stringent assumptions
Consistency
• If the alternative hypothesis is true, the power of test approaches 1 as
sample size approaches infinity
• More consistent tests are more sensitive

Relative Efficiency or Power Efficiency
• Relative number of additional observations needed using the less
efficient estimator to obtain the same power
• To compare two tests, make all factors equivalent except for sample size
• With same hypotheses, rejection region, & significance level:
𝑛𝑏
Power efficiency of Test A relative to Test B = 𝑥100%
𝑛𝑎
Where na is the number of observations required by Test A for the power of
Test A to equal the power of Test B when nb observations are employed
14
06/08/2019

• Example:
If test A requires a sample of 𝑛𝑎 = 25 cases to have the same power as
test B with 𝑛𝑏 = 20 cases, then
20
𝑃𝑜𝑤𝑒𝑟 𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦 𝑜𝑓 𝑇𝑒𝑠𝑡 𝐴 = 𝑥100% = 80%
25
In order to have the same power for both tests, we need to have 10
samples of test A for every 8 samples of test B.
Implication:
If using a less powerful test with fewer assumptions, get a larger
sample size.
Primary requirements for good

performance in hypothesis testing
• Power – Test is sensitive to changes in the specific factors tested
• Robustness – Test is insensitive to changes of a magnitude likely to
occur due to extraneous factors
Parametric Statistics Non-parametric statistics
• Derived in a way that power • Inherently robust because of

requirement is satisfied general assumptions and this
• Since not valid if assumptions is usually used as performance
not met, robustness is of great criterion for nonpar tests
concern • May suffer some loss on power
15
06/08/2019
Nonparametric Methods
Satisfy at least one of the following:
1. May be used on data with a nominal scale of measurement
2. May be used on data with an ordinal scale of measurement
3. May be used on data with an interval/ratio scale of
measurement, where the distribution function of the random
variable is unspecified
16
06/08/2019
References
• Daniel, W. W. (2009). Biostatistics: A Foundation for Analysis in
the Health Sciences, 9th Edition. Wiley & Sons, Inc. USA.
• Gibbons, J. D. & Chakraborti, S. (2003). Nonparametric Statistical
Inference, 4th Edition. Marcel Dekker, Inc., USA.
• Hammel, T. (2017) Materials for Stata 464: Applied
Nonparametric Statistics. Pennsylvania State University. Accessed
from: https://onlinecourses.science.psu.edu/stat464/node/1
• Sprent, P. & Smeeton, N. C. (2001). Applied nonparametric
statistical methods, 3rd Edition. Chapman and Hall/CRC., USA.
17

Introduction To Non-Parametric Statistics: Learning Objectives

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Introduction To Non-Parametric Statistics: Learning Objectives

Hochgeladen von

Copyright:

Verfügbare Formate

06/08/2019

• Calculation or presentation of • Making evaluations or

2. Evaluation of data through the use of an inferential

• Based on assumptions regarding the underlying population

How valid is the p-value?

How valid is the p-value?

How valid is the p-value?

• More often, p-value is calculated using an approximation to the

• Most of the time, normality is assumed based on:

• Can normality be assumed?

• Could lead to misleading results

• Otherwise, perform methods to adhere to these assumptions

• Calculation or presentation of • Making evaluations or

Parametric and Non-Parametric

• Does this apply to all non-parametric tests?

• Non-parametric and parametric hypotheses are identical in the

• Most tests have a lower power than their parametric counterparts

Decision to use Non-Parametric Statistics

Usually, a selection of interchangeable methods is available

Decision to use Non-Parametric Statistics

Decision to use Non-Parametric Statistics

Decision to use Non-Parametric Statistics

If a substantial number of participants have undetectable levels, the

Criteria to assess performance of a test

Power function of the test

Criteria to assess performance of a test

Criteria to assess performance of a test

Primary requirements for good

Parametric Statistics Non-parametric statistics

• Derived in a way that power • Inherently robust because of

Das könnte Ihnen auch gefallen