Sie sind auf Seite 1von 8

Inferential

Statistics
CalculiciousClosingTimeCapstone:
TheGreatTopicTakedown

Table of Contents
1 History of Statistic
2 Descriptive vs. Inferential
3 The Why?
4 Big Picture
5 Definitions
6 Resources

Inferential Statistics

History
----------

Statistic began in Ancient Greece with Philosophers; they had idea but no
quantitative analyses. In the 17th Century, Petty Graunt studied affairs of
state, vital statistics of populations. Bernoulli Pascal studied probability
through games of chance and gambling.In the 18th Century, Gauss he contributed
normal curve and regression though the study of astronomy.In the 19th Century,
Quetelet, an astronomer who first applied statistical analyses to human
biology. Galton studied genetic variation in humans. He used regression and
correlation. In the beginning of the 2o century, Pearson studied natural
selection using correlation. He formed the first academic department of
statistics, Biometrika journal, and helped develop the Chi Square analysis.
Gossett studied process of brewing, alerted the statistics community about
problems with small sample sizes, developed Student's test. Fisher, an
evolutionary biologists, developed ANOVA, stresses the importance of
experimental design. Later in the 20 century, Wilcoxon, biochemist, studied
pesticides, non- parametric equivalent of two-samples test. Wallis Kruskal,
economist, developed the non-parametric equivalent of the ANOVA. Spearman,
psychologist, developed a non-parametric equivalent of the correlation
coefficient . Kendall, statistician, developed the same thing as Spearman.
Tukey, statistician, developed multiple comparisons procedure. Dunnett,
bichemist, studied pesticides, developed multiple comparisons procedure for
control groups. Keuls, agronomist, developed multiple comparisons procedure.
And lastly, Computer Technology, provided many advantages over calculations by
hand or by calculator, stimulated the growth of investigation into new
techniques.

Descriptive Statistics vs. Inferential Statistics


Descriptive Statistics consists of the collection, organization, summarization,
and presentation of data. It describes the sample or population usually
providing values of range, maximum, minimum, central tendency, and variance.

Inferential Statistics consists of generalizing from samples to populations,


performing estimations and hypothesis tests, determining relationships among
variables, and making predictions.

They both tell you something about the population. Descriptive is describing
the population and inferential is making educated guessing on the population
based on the sample.

Why?

--------------We all read/heard statistics. They are in


every media outlet. They all come from
experiments and studies. Inferential
statistics helps deliver accurate and valid
statistics to the audience and to the people
running the test. In return the audience can
make wise decisions. Inferential statistics
allows people to generated findings and apply
them to the larger population.

What is inferential statistic?


-------------With inferential statistics, you are trying to
reach conclusions that extend beyond the immediate
data alone. For instance, we use inferential
statistics to try to infer from the sample data
what the population might think. Or, we use
inferential statistics to make judgments of the
probability that an observed difference between
groups is a dependable one or one that might have
happened by chance in this study.

The Big Picture


There are two methods of reaching a statistical inference:

a) Estimation
In estimation, a sample from a population is studied and an
inference is made about the population based on the sample.The key
to estimation is the probability with which particular values will
occur during sampling - this allows the inference about the
population to be made. The values that occur are inevitably based
on the sampling distribution of the population. The key to making
an accurate inference about a population is random sampling, where:
each possible sample of the same size has the same probability of
being selected from the population.

b) Hypothesis Testing
To answer a statistical question, the question is translated into a
hypothesis - a statement that can be subjected to test. Depending
on the result of the test, the hypothesis is accepted or rejected.
The hypothesis tested is known as the null hypothesis (H0). This
must be a true/false statement. For every null hypothesis, there is
an alternative hypothesis (HA). Constructing and testing hypotheses
is an important skill, but the best way to construct a hypothesis
is not necessarily obvious:
The null hypothesis has priority and is not rejected unless there
is strong evidence against it.
If one of the two hypotheses is 'simpler' it is given priority so
that a more 'complicated' theory is not adopted unless there is
sufficient evidence against the simpler one.
In general, it is 'simpler' to propose that there is no
difference between two sets of results than to say that there is a
difference.
The outcome of a hypothesis testing is "reject H0" or "do not
reject H0". If we conclude "do not reject H0", this does not
necessarily mean that the null hypothesis is true, only that there

is insufficient evidence against H0 in favour of HA. Rejecting the


null hypothesis suggests that the alternative hypothesis may be
true.
In order to decide whether to accept or reject the null hypothesis,
the level of significance (a) of the result is used (a = 0.05 or a
= 0.01). This allows us to state whether or not there is a
"significant difference" (technical term!) between populations,
i.e. whether any difference between populations is a matter of
chance, e.g. due to experimental error, or so small as to be
unimportant.

Procedure for hypothesis testing:


1. Define H0 and HA, based on the guidelines given above.
2. Choose a value for a. This should be done before performing
the test, not when looking at the result!
3. Calculate the value of the test statistic.
4. Compare the calculated value with a table of the critical
values of the test statistic.
5. If the calculated value of the test statistic is LESS THAN the
critical value from the table, accept the null hypothesis (H0).
If the absolute (calculated) value of the test statistic is
GREATER THAN or EQUAL to the critical value from the table,
reject the null hypothesis (H0) and accept the alternative
hypothesis (HA).
Note that:
A significance test can never prove a null hypothesis, only
fail to disprove it.

StandardScores(zScores)
z scores define the position of a score in relation to the mean
using the standard deviation as a unit of measurement.Standard
scores are therefore useful for comparing datapoints in different

distributions.
z = (score - mean) / standard deviation
The z-score is the number of standard deviations that the sample
mean departs from the population mean.
Since this technique normalizes distributions, z-scores can be used
to compare data from different sets.
Ex : Bob's performance on module 1 and module 2 improve or
decline?):
Bob scored 71.2% on module 1 (mean = 65.4%, SD = 3.55)
z = (71.2 - 65.4) / 3.55 = 1.63
Bob scored 66.8% on module 2 (mean = 61.1%, SD = 2.54)
z = (66.8 - 61.1) / 2.54 = 2.24
Conclusion: Bob did better, relatively speaking, on module 2 than
on module 1, even though his mark was lower on this course.
Note that the z-score is only informative when it refers to a
normal distribution - calculating a z-score from a skewed dataset
may not produce a meaningful number.

Definitions

Central Limit
Theorem:

This theorem states that the random sampling distribution


of means will always tend to be normal, irrespective of the
shape of the population. The theorem further states that
the random sampling distribution of means will become
closer to normal as the size of the samples increases. For
example if you flip a coin 100 times you would likely get a
50 % head and 50 % tails versus on flipping a coin twice

Sampling error:

This is not an error, but a natural, expected random


variation that'll cause the sample statistic to differ from
the population parameter. The outlier of sample data.

Standard Error:

Standard error is the population standard deviation divided


by the square-root of sample size.
The more the sampling size, the less the standard
error.
The shape ofthe random sampling distribution of
means, as reflected by its standard error is affected
by size of the sample.
This is why results of large studies should be trusted
more small studies.

Z score:

Z-score is = (sample mean - population mean) divided by


population standard deviation.
The statistic Z-score tells us how far the observed
sample mean is from the population mean.
We can use Z-scores to calculate the probability that
a sample will have a mean of above or below a given
value.

Precision vs.
Accuracy:

Precision is the degree to which a figure (such as an


estimate of a population mean) is immune from random
variation. The width of the confidence interval (C.I.)
reflects precision -- the wider the C.I. the less precise
the estimate. Larger samples create smaller C.I.
Accuracy is the degree to which an estimate is immune
from systematic error or bias.
Halving the width of the confidence interval is the
same as doubling precision.
When standard error is halved, width of confidence
interval is also halved.
Less standard deviation = more precise.

Confidence
Limits:

Confidence limit = sample mean + or - (Z score times sample


standard deviation).
Researchers place special emphasis on + and - 1.96
standard errors.
Logically, if sample mean lies within + and - 1.96
standard errors of the population mean 95% of the
time, then population mean must also lie + or - 1.96
standard errors of sample mean 95% of the time.
The exact z score for a 95% confidence limit is 1.96.
68% confidence limits = Sample mean + or approximately 1x standard error.

95% confidence limits = Sample mean + or approximately 2x standard error.


99.7% confidence limits = Sample mean + or approximately 3x standard error.

Estimated
Standard Error
of Mean:

Estimated Standard Error of Mean = Sample standard


deviation divided by square-root of sample size.
The equation above is used to calculate standard when
population standard deviation is not known.

T-scores:

The estimated standard error is used to find a statistic t


which is sometimes known as the student's t. This is the
number of estimated standard errors by which the sample
mean lies above or below the population mean.
T score = (sample mean - population mean) / Estimated
Standard Error of Mean.

Sources & Resources


http://vassarstats.net/textbook/
https://www.khanacademy.org/math/probability/statistics-inferen

tial
https://extension.usu.edu/evaluation/files/uploads/Start%20Your

%20Engine/Study%20the%20Route/Analyze%20the%20Data/UsingInferen
tialStatistics.pdf

Das könnte Ihnen auch gefallen