Sie sind auf Seite 1von 17

06/08/2019

Introduction to
Non-Parametric Statistics
Kim Carmela D. Co
Email: kimcarmelaco@up.edu.ph

Learning Objectives
At the end of the session, the participants should be able to:
1. Discuss the process of hypothesis testing and statistical
significance and its required assumptions
2. Describe and differentiate distribution free tests from strictly
non-parametric tests
3. Discuss the advantages and disadvantages of non-parametric
statistical methods
4. Discuss scenarios where non-parametric methods are useful
5. Discuss criteria in selecting statistical tests

1
06/08/2019

Branches of Statistics

Descriptive Inferential

• Calculation or presentation of • Making evaluations or


distinguishable and well-defined probability statements
characteristics of the data concerning the accuracy of an
• No assumptions made or implied estimate or reliability of a
• Description of the sample may decision
represent the population if it is a • Use of sample statistics to infer
random sample some information about the
population, usually an unknown
population parameter

Hypothesis Testing
1. Statement of statistical hypotheses
(Statements about the population)
• Null hypothesis is a statement of no effect or no difference
• Alternative hypothesis indicates presence of an effect or
difference, can be directional or nondirectional

2. Evaluation of data through the use of an inferential


statistical test, which yields a test statistic
• Comparison of test statistic with the critical region based on the alpha
and the assumed distribution
• Test may be one-tailed or two-tailed

2
06/08/2019

Hypothesis Testing
3. Assess the statistical significance
Decision on whether obtained difference is due to presence of
genuine effect or may be attributable to chance is based on the
computation of the p-value:
• Probability of obtaining a result as extreme or more extreme than the
sample result, given that the null hypothesis is true

• Based on assumptions regarding the underlying population


distribution (usually its form and some parameter values)

3
06/08/2019

How valid is the p-value?


Type I Error
• Rejecting a true null hypothesis
• Level of Significance (α) – maximum probability of rejecting a
true null hypothesis, assuming Ho (and the other assumptions) is
true
• Also known as the size of the critical region

How valid is the p-value?


Type II Error
• Not rejecting a false null hypothesis
• Power (1 – β) – probability of rejecting a false null hypothesis

https://www.probabilitycourse.com/chapter8/8_4_2_general_setting_definitions.php

4
06/08/2019

How valid is the p-value?


• If assumptions about the underlying distribution are correct, the
p-value reflects the exact probability of getting the observed
sample data

• More often, p-value is calculated using an approximation to the


true distribution (here, it is approximate or asymptotic, rather
than exact)

Hypothesis Testing
• Conclusions using these techniques are only valid if assumptions
can be substantiated
• For most statistical tests (z-test, t-test, F-test):
• Randomly drawn from a normally distributed population
• Consist of independent observations, except for paired values
• Consist of values on an interval or ratio measurement scale
• Have populations with approximately equal variances

• Most of the time, normality is assumed based on:


• Histogram or box plot of the sample data
• Previous literature about the variables

5
06/08/2019

• Can normality be assumed?

• Generally, normality is
difficult to assume for
asymmetric histograms,
especially for
small sample sizes

• Could lead to misleading results


if parametric statistics are used

Alternatives?
If normality cannot be assumed, then the data should at least:
• Approximately resemble a normal distribution
• Have an adequately large sample size
• Invocation of the Central Limit Theorem
“Sampling distribution of the mean becomes approximately normal
regardless of the distribution of the original variable”

• Otherwise, perform methods to adhere to these assumptions


• Increase sample size, get ratio level data, transform data

6
06/08/2019

Distribution-Free Tests
• Methods based on functions of the sample observations whose
sampling distribution can be determined
without knowledge of the specific distribution of the
underlying population
• No strict assumptions regarding the underlying population
distribution or level of measurement
• Analysis methods can be applied to samples from populations
having distributions which need not belong to a specific
distribution family (particularly the normal one)
• Usually, exact p-values can be computed

Branches of Statistics

Descriptive Inferential

• Calculation or presentation of • Making evaluations or


distinguishable and well-defined probability statements
characteristics of the data concerning the accuracy of an
• No assumptions made or implied estimate or reliability of a
• Description of the sample may decision
represent the population if it is a • Use of sample statistics to infer
random sample some information about the
population, usually an unknown
population parameter

7
06/08/2019

Parameters
Numerical characteristics of the population from which sample was
drawn

1. Measures of Location
2. Measures of Variability
3. Measures of Association between Variables

Parametric and Non-Parametric


Non-parametric
Parametric Statistics
Statistics
• Classical statistical • Test of hypothesis
techniques whose which is not a
objective is estimating statement about
or testing a hypothesis parameter values
about one or more
population parameters

8
06/08/2019

Non-Parametric Statistics
• Include procedures that test hypotheses which are not statements
about population parameters
• Hypothesis is concerned only with the form or shape of the
population distribution (e.g. goodness of fit tests) or with some
other characteristic of the probability distribution of the sample
data (test of randomness or trend)

• Does this apply to all non-parametric tests?

Non-Parametric Statistics
• Generally used to refer to both non-parametric and distribution-
free tests

• Non-parametric and parametric hypotheses are identical in the


case of continuous and symmetrical populations
• BUT non-parametric methods may also be used for data that
simply specify order or counts of numbers of events or for
hypotheses that are not about parameters.

9
06/08/2019

Advantages
• Tests of hypotheses which are not statements about parameter
values have no counterpart in parametric statistics
• Require few assumptions about the underlying populations from
which the data are obtained
• Conclusions reached in nonparametric methods do not require many
qualifiers to be considered valid
• In most cases, quick and easy to apply
• Tests often involve simple arithmetic, and easy to understand
• Can be used for data in any level of measurement

Advantages
• Scope of application is wider
• Can be used for instances where it is impractical or impossible to obtain
quantitative measurements
• Many nonparametric procedures require just the ranks of the
observations
• Whereas the parametric procedures require the magnitudes
• Can simply be a different approach to solving standard statistical
problems
• Relatively insensitive to outlying observations (uses median rather than mean)
• Enables the user to obtain exact P-values for tests, without relying on assumptions
that the underlying populations are normal

10
06/08/2019

Disadvantages
• Difficulty in construction of confidence intervals
• May overlook information that may have led to a better solution.
• For some tests, may require a larger sample size to achieve the same power as
parametric counterparts

• Most tests have a lower power than their parametric counterparts


• Combined with the fact that non-parametric tests are usually applied for datasets with
smaller sample sizes

Disadvantages
• May be considered by some as discarding information (conversion of
quantitative to ranks), since quantitative data is not required by
nonparametric methods
• BUT it may be argued that:
• If underlying distribution is known, and classical testing can be applied,
then there is no need to use nonparametric methods
• Usually, the nonparametric procedures are only slightly less efficient than
their normal theory competitors when the underlying populations are
normal
• Nonparametric methods can be mildly or wildly more efficient than these
competitors when the underlying populations are not normal

11
06/08/2019

Decision to use Non-Parametric Statistics


For small samples:
• Should be considered when there is any doubt about assumptions for parametric tests, since
they are frequently almost as powerful as parametric statistics
For large sample sizes:
• Either test may be more reliable, depending on the particular tests compared and type or
degree of deviations assumed
No generalizations on decision for moderate-sized samples

Usually, a selection of interchangeable methods is available


• Power comparisons are available among tests designed to detect location differences
• Comprehensive conclusions impossible for blanket comparisons of general tests

Decision to use Non-Parametric Statistics


• When the outcome is an ordinal variable or a rank

12
06/08/2019

Decision to use Non-Parametric Statistics


• When it is clear that the outcome does not follow a normal
distribution
• When there are definite outliers
• Example: Days in the hospital following a particular surgical procedure

Decision to use Non-Parametric Statistics


• When the outcome has clear limits of detection
• When the outcome is a continuous variable that is measured with some
imprecision (e.g., with clear limits of detection)

Example:
HIV viral load - can range from "not detected" or "below the limit of
detection" to hundreds of millions of copies. Thus, in a sample some
participants may have measures like 1,254,000 or 874,050 copies and others
are measured as "not detected."

If a substantial number of participants have undetectable levels, the


distribution of viral load is not normally distributed

13
06/08/2019

Criteria to assess performance of a test

Power function of the test


• Probability that the test statistic will lead to a rejection of H0 if it is false
• A test is most powerful for a specified alternative hypothesis if no other
test of the same size has a greater power against the same alternative
• More powerful tests usually have much more stringent assumptions
Consistency
• If the alternative hypothesis is true, the power of test approaches 1 as
sample size approaches infinity
• More consistent tests are more sensitive

Criteria to assess performance of a test


Relative Efficiency or Power Efficiency
• Relative number of additional observations needed using the less
efficient estimator to obtain the same power
• To compare two tests, make all factors equivalent except for sample size
• With same hypotheses, rejection region, & significance level:
𝑛𝑏
Power efficiency of Test A relative to Test B = 𝑥100%
𝑛𝑎
Where na is the number of observations required by Test A for the power of
Test A to equal the power of Test B when nb observations are employed

14
06/08/2019

Criteria to assess performance of a test


• Example:
If test A requires a sample of 𝑛𝑎 = 25 cases to have the same power as
test B with 𝑛𝑏 = 20 cases, then
20
𝑃𝑜𝑤𝑒𝑟 𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦 𝑜𝑓 𝑇𝑒𝑠𝑡 𝐴 = 𝑥100% = 80%
25
In order to have the same power for both tests, we need to have 10
samples of test A for every 8 samples of test B.
Implication:
If using a less powerful test with fewer assumptions, get a larger
sample size.

Primary requirements for good


performance in hypothesis testing
• Power – Test is sensitive to changes in the specific factors tested
• Robustness – Test is insensitive to changes of a magnitude likely to
occur due to extraneous factors

Parametric Statistics Non-parametric statistics

• Derived in a way that power • Inherently robust because of


requirement is satisfied general assumptions and this
• Since not valid if assumptions is usually used as performance
not met, robustness is of great criterion for nonpar tests
concern • May suffer some loss on power

15
06/08/2019

Nonparametric Methods
Satisfy at least one of the following:
1. May be used on data with a nominal scale of measurement
2. May be used on data with an ordinal scale of measurement
3. May be used on data with an interval/ratio scale of
measurement, where the distribution function of the random
variable is unspecified

16
06/08/2019

References
• Daniel, W. W. (2009). Biostatistics: A Foundation for Analysis in
the Health Sciences, 9th Edition. Wiley & Sons, Inc. USA.
• Gibbons, J. D. & Chakraborti, S. (2003). Nonparametric Statistical
Inference, 4th Edition. Marcel Dekker, Inc., USA.
• Hammel, T. (2017) Materials for Stata 464: Applied
Nonparametric Statistics. Pennsylvania State University. Accessed
from: https://onlinecourses.science.psu.edu/stat464/node/1
• Sprent, P. & Smeeton, N. C. (2001). Applied nonparametric
statistical methods, 3rd Edition. Chapman and Hall/CRC., USA.

17

Das könnte Ihnen auch gefallen