Sie sind auf Seite 1von 4

The statistical problem solving cycle Grouped frequency data

Data are numbers in context and the goal of statistics is If the data are given in the form of a grouped frequency
to get information from those data, usually through distribution where we have fi observations in an interval
problem solving. A procedure or paradigm for statistical whose mid-point is xi then, if ∑ fi = n
Guides to
problem solving and scientific enquiry is illustrated in
the diagram. The dotted line means that, following
x=
∑ fi xi =
∑ fi xi and
Statistical Information 1
discussion, the problem may need to be re-formulated
and at least one more iteration completed.
∑ fi n
2
Specify the 2 (∑ fi xi )
Sxx = ∑ fi ( xi − x ) = ∑ fi xi2 − .
problem and plan n

Interpret Collect data


Events & probabilities
The intersection of two events A and B is A∩B. The
Probability & Statistics
and discuss
union of A and B is A∪B. A and B are mutually Facts and Formulae
exclusive if they cannot both occur, denoted
Process, represent
and analyse data A∩B=∅, where ∅ is called the null event.
Descriptive statistics For an event A, 0≤ P(A )≤ 1. For two events A and B
Given a sample of n observations x1, x2,L, xn we define P(A∪B) = P(A) + P(B) – P(A∩B). If A and B are mutually
the sample mean to be exclusive then P(A∪B) = P(A) + P(B).

x= 1
x + x2 + L + xn
=
∑ xi Equally likely outcomes
n n If a complete set of n elementary outcomes are all
and the corrected sum of squares by equally likely to occur, then the probability of each
2 elementary outcome is 1/n. If an event A consists of m
2 ( ∑ xi ) of these n elements, then P(A) = m/n.
Sxx = ∑ ( xi − x ) ≡ ∑ xi2 − nx 2 ≡ ∑ xi2 − .
n Independent events
Sxx A, B are independent if and only if P(A∩B)=P(A)P(B).
is sometimes called the mean squared deviation
n Conditional Probability of A given B
and an unbiased estimator of the population P( A ∩ B)
Sxx P( A | B) = if P( B ) ≠ 0.
variance σ 2 is s 2 = . The sample standard P (B )
(n − 1) Bayes’ Theorem
deviation is s. In calculating s2, the divisor (n-1) is P ( A | B )P (B )
called the degrees of freedom (df). Note that s is also P (B | A ) = . Resources to support the learning of mathematics,
P ( A) statistics and OR in higher education.
sometimes written σ̂ .
Theorem of Total Probability
If the sample data are ordered from smallest to largest The k events B1 , B 2 ,.., Bk form a partition of the sample www.mathstore.ac.uk
then the: space S if B1 ∪ B 2 ∪ B 3 ... ∪ Bk = S and no two of the Bi ’s
minimum (Min) is the smallest value; The Statistical Education through Problem Solving
can occur together. Then P ( A) = ∑ P ( A | Bi )P (Bi ) .
lower quartile (LQ) is the ¼(n+1)-th value; (STEPS) glossary
i
median (Med) is the middle [or the ½(n+1)-th] value; www.stats.gla.ac.uk/steps/glossary/
In this case Bayes Theorem generalizes to
upper quartile (UQ) is the ¾(n+1)-th value;
maximum (Max) is the largest value. P ( A | Bi )P (Bi )
P (Bi | A) = (i = 1,2,..., k ) .
These five values constitute a five-number summary ∑ P ( A | Bi )P (Bi ) Further reading
Kotz, S. and Johnson, L. (1988) Encyclopedia of
of the data. They can be represented diagrammatically i
by a box-and-whisker plot, commonly called a boxplot. If B ′ is the complement of the event B , P( B ′ )=1 – P( B ) Statistical Sciences, Vols 1 – 9. New York: John Wiley
and Sons.
and P ( A) = P ( A | B ) P (B ) + P ( A | B ′ )P (B ′ ) is a special
case of the theorem of total probability. Also, the
Min LQ Med UQ Max complement of the event B is commonly denoted B.

1 2
Permutations and combinations Variance The Central Limit Theorem
The number of ways of selecting r objects out of a total The variance of a random variable is defined as If a random sample of size n is taken from any
of n, where the order of selection is important, is the Var ( X ) = E ( X − µ )2  ≡ E  X 2  − µ 2. distribution with mean µ and variance σ2, the sampling
n!     distribution of the mean will be approximately
number of permutations: n Pr = . Properties: ~N(µ,σ2/n), where ‘ ~ ’ means ‘is distributed as’. The
(n − r )!
Var( X )≥0 and is equal to 0 only if X is a constant. larger n is, the better the approximation.
The number of ways in which r objects can be selected
from n when the order of selection is not important is Var(a X + b) = a2Var( X ), where a and b are constants.
The standard normal and Student’s t distributions
n  n!
the number of combinations: n Cr =   = . Moment generating functions
 r  r !( n − r )! The moment generating function (mgf) of a random Dotted line - N(0,1)
distribution Continuous line - t distribution
n n n n variable is defined as MX(t)= E[exp(t X )] if this exists.
Cn must equal 1, so 0!=1 and C0 = 1; Cr = Cn−r ; with 3 degrees of freedom
E[ X k] can be evaluated as the:
n
C0 + n C1 +... n Cn−1 + n Cn = 2n ; n +1Cr = n Cr + n Cr −1 . (i) coefficient of t r/(r !) in the power expansion of MX(t);
Random variables (ii) r-th derivative of MX(t) evaluated at t = 0.
Data arise from observations on variables that are t, z
measured on different scales. A nominal scale is used Measures of location
for named categories (eg race, gender) and an ordinal The mean or expectation of the random variable X is If a random variable X ~ N(µ,σ2), z = (X-µ)/σ ~ N(0,1),
scale for data that can be ranked (eg attitudes, position) E[ X ] the long-run average of realisations of X . the standard normal distribution. The t distribution with
- no arithmetic operations are valid with either. Interval The mode is where the pmf or pdf achieves a (n-1) degrees of freedom is used in place of z for small
scale measurements can be added and subtracted only maximum (if it does so). samples size n from normal populations when σ2 is
(eg temperature), but with ratio scale measurements For a random variable, X , the median is such that unknown. As n increases the distribution of t converges
(eg age, weight) multiplication and division can be used to N(0,1). These distributions are used, for example, for
P( X ≤ median) = ½, so that 50% of values of X occur
meaningfully as well. Generally, random variables are inference about means, differences between means
above and 50% below the median.
either discrete or continuous. Note: all data are discrete and in regression.
because the accuracy of measuring is always limited.
Percentiles
A discrete random variable X can take one of the Fisher’s F distribution
x p is the 100-p-th percentile of a random variable X if
values x1, x2 ,L , the probabilities pi = P ( X i = xi ) must P( X ≤ x p) = p. For example, the 5th percentile, x0.05,
satisfy pi ≥ 0 and p1 + p2 + … = 1. The pairs ( xi , pi) form has 5% of the values smaller than or equal to it. The f(F)
the probability mass function (pmf) of X. median is the 50-th percentile, the lower quartile is the
A continuous random variable X takes values x from a 25th percentile, the upper quartile is the 75th percentile. The F10,20 distribution

continuous set of possible values. It has a probability


density function (pdf) f( x ) that satisfies f( x ) ≥0 and Measures of dispersion
b The inter-quartile range is defined to be the difference
between the upper and lower quartiles, UQ - LQ. F
∫ f ( x )dx = 1, with P (a < x ≤ b) = ∫ f ( x )dx. The standard deviation is defined as the square root
a
Expected values of the variance, σ = √Var( X ), and is in the same units If X1 ~ χν21 and X2 ~ χν22 are independent random
The expected value of a function H ( X ) of a random as the random variable X.
X 1 / ν1
variables then ~ Fν1 ,ν2 , the F distribution with
variable X is defined as Cumulative Distribution Function X 2 / ν2
∑ H ( xi ) P ( X = xi ), X discrete, This is defined as a function of any real value t by ( ν1 , ν 2 ) degrees of freedom. This distribution is used,
E H ( X ) =  F(t) = P( X ≤ t).
 for example, for inference about the ratio of two
 ∫
H ( x )f ( x )dx, X continuous.
X is a continuous random variable if F is a continuous variances, in Analysis of Variance (ANOVA) and in
Expectation is linear in that the expectation of a linear function of t; if X is discrete, then F is a step function. simple and multiple linear regression.
combination of functions is the same linear combination
of expectations. For example,
E[ X 2+ log X +1] = E[ X 2] + E[log X ] + 1,
but E[log X ] ≠ logE[ X ] and E[1/ X ] ≠ 1/E[ X ].

3 4 5
Statistics & Sampling Distributions Simple Linear Regression Time Series
Population and samples To fit the straight line y = α + β x to data ( xi , y i ), A time series Yt (t=1,2, …, n) is a set of n observations
A (statistical) population is the complete set of all recorded through time t, (eg. days, weeks, months). The
i = 1,..., n, by the method of least squares the
possible measurements or values, corresponding to the arithmetic mean of blocks of k successive values
entire collection of units, for which inferences are to be estimates of slope, β , and intercept α , are given by: (Y1+Y2+…+Yk)/k, (Y2+Y3+…+Yk+1)/k,… is a simple
made from taking a sample - the set of measurements
or values that are actually collected from a population. b=
∑ xi y i − n1 (∑ xi ∑ y i ) = Sxy , moving average of order k, itself a time series which is
smoother than Yt and can be used to track, or estimate,
2
∑ xi2 − n1 (∑ xi ) Sxx the underlying level, µt, of Yt. If 0<α<1 an exponentially
Simple random sample: every item in the population is weighted moving average (EWMA) at time t uses a
a = y − bx .
equally likely to be in the sample, independently of discounted weighted average of current and past data
which other members of the population are chosen. If we assume that the xi are known and that the y i have to estimate µt with µ̂t = αYt+α(1-α)Yt-1+α(1-α)2Yt-2+….
Parameter: a quantity that describes an aspect of a normal distributions with means α + β xi , and constant This is equivalent to the recurrence relation
population, eg. the population mean, µ or variance, σ2.
Statistic: a quantity calculated from the sample, eg. the variance σ 2 , written as y i ~N( α + β xi , σ 2 ), then if x0 is µ̂t = αYt+(1-α) µ̂t −1 . Moving averages are often plotted
sample mean x or variance s2. a fixed value on the same graph as Yt. If Yt additionally contains
 σ 2  trend, Rt, the rate of change of data per unit time, and
Sampling distributions: The value of a statistic will in b ~ N β, ,

µt = µt-1+Rt-1, then the recurrence relation is
general vary from sample to sample, in which case it  Sxx  µ̂t = αYt+(1-α)( µ̂t −1 + Rˆ t −1 ). If 0< β <1 the trend
will have its own probability distribution, called its  1 x2 
 
sampling distribution. A statistic used to estimate the 
a ~ N α, σ 2  + , smoothing equation is Rˆ t = β( µˆ t - µˆ t −1 )+(1-β) Rˆ t −1 , known
value of a parameter θ in a distribution is called an   n Sxx  as Holt’s Linear Exponential Smoothing. If Yt also

 

estimator (the random variable) or an estimate (the  
 2  contain seasonality, St, a smoothing constant γ (0< γ <1)
  1 ( x − x )  
value). If θˆ is an estimator of θ , the mean of its a + bx0 ~ N α + β x0 , σ 2  + 0 .
is used in a seasonal smoothing equation,
sampling distribution, E[ θˆ ], is called the sampling mean   n Sxx  Sˆt = γYt/ µ̂t + (1- γ) Sˆ t −k , assuming the periodicity is k,
 
 

and the variance, Var( θˆ ), is called the sampling A common alternative is to use α̂ for a and β̂ for b.
with multiplicative seasonality. For monthly data k=12.
variance. √Var( θˆ ) is called the standard error of θ̂ . If Forecasting from time n (now) to time n+h (h=1,2,..)
Level only, Yˆn +h = µ̂n , the latest EWMA.
E[ θˆ ] = θ , then θ̂ is an unbiased estimator of θ. eg, X Correlation
is an unbiased estimator for µ and has sampling Given observations ( xi , y i ) , i = 1,..., n on two random Level and constant trend, Yˆn + h = a + b(n+h), the simple
σ2 variables X and Y , the Pearson (product moment) linear regression trend line of Yt against t.
variance where Var( X i ) = σ 2, ( i = 1,..., n ). correlation between them is given by: Level and changing trend, Yˆn + h = µ̂n +h Rˆ n .
n
Corrected sum of squares
r=
Sxy
=
∑ xi y i − n1 (∑ xi ∑ y i ) .
Level, changing trend and seasonality Yˆn +h = µˆ n +h Rˆ n ,

{ }{ }
2
2 ( ∑ xi ) Sxx Syy 2
∑ xi2 − n1 (∑ xi ) ∑ y i2 − n1 (∑ y i )
2 where µˆ n = αYn/ Sˆ n−12 +(1-α)( µˆ n −1 + Rˆ n−1 ).
Sxx = ∑ ( xi − x ) ≡ ∑ xi2 − nx 2 ≡ ∑ xi2 −
n
has expectation (n − 1)σ 2 so that dividing Sxx by (n–1)
We use r to estimate the correlation, ρ , between X and
will give an unbiased estimator of σ 2 , denoted s2.
 1 
Y . For large n, r is approximately ~ N ρ,  . The
Normal and Chi-squared distributions  n − 2 
If X1, X 2 ,L, X n are independently and identically (Spearman) Rank Correlation Coefficient is given by
 X − µ 
2 6∑ d i 2
~N(µ,σ ), then ∑  i
2
 ~ χn2 a Chi-squared rS = 1− 
 σ  (
n n2 −1 )
distribution with n degrees of freedom.
where d i is the difference between the ranks of
 σ 2  S
Also X ~ N µ,  independently of XX ~ χ(2n−1) . ( xi , y i ) , i = 1,..., n . If ranks are tied, see further reading.
 n  σ2

8
6 7
A hypothesis test involves testing a claim, or null hypothesis One sample hypothesis tests Two sample hypothesis tests
H0, about a parameter against an alternative, H1. A decision to 1. For X~ N(µ,σ2), σ2 known; random sample evidence x and n. For X1~ N(µ1,σ12), X2~ N(µ2,σ22), σ12, σ22 unknown; random
reject H0 or not reject H0 uses sample evidence to calculate a Null hypothesis, H0: µ=µ0; 2-sided alternative H1: µ≠µ0. sample evidence x1 , x2 , s12, s22, n1, and n2.
test statistic which is judged against a critical value. H0 is Test statistic zcalc = ( x - µ0)/ (σ/√n) ~ N(0,1). 1. Null hypothesis, H0: µ1−µ2=c; 2-sided alternative H1: µ1−µ2≠c.
maintained unless it is made untenable by sample evidence. Reject Ho (at the α level) if zcalc ≥ zα/2, the critical value of z.
Rejecting H0 when we should not is a Type I error. The Test statistic tcalc = ( x1 - x2 -c)/s√(1/n1 +1/n2) ~ t(n1 +n2 −2) , and
probability (we are prepared to accept) of making a Type I error 2. For X~ N(µ,σ2), σ2 unknown; sample evidence x , s and n. s2 = {(n1-1)s12+(n2-1)s22}/(n1+n2-2), assuming σ12=σ22.
is called the significance level α and yields the critical value. Null hypothesis, H0: µ=µ0; 2-sided alternative H1: µ≠µ0. Reject Ho if tcalc ≥ tα/2 the critical value of t with (n1+n2-2) df.
The smallest α at which we can just reject H0 is the p-value of Test statistic tcalc = ( x - µ0)/ (s/√n) ~ t(n-1) the t distribution with
(n-1) df. For n>30 and if X has any distribution, t~N(0,1). 2. Null hypothesis H0: σ12=σ22; alternative H1: σ12>σ22.
the test, the tail area outside the test statistic. Not rejecting H0
when we should is a Type II error, with probability β. The Reject Ho if tcalc ≥ tα/2 the critical value of t with (n-1) df. Test statistic Fcalc = (n1-1)s12/(n2-1)s22~ Fn1 −1,n2 −1 .
power of a hypothesis test is 1- β. An interval estimate for a 3. For X~ N(µ,σ2), σ2 unknown; sample evidence s and n. Reject H0 if Fcalc> Fα the critical value of F with n1-1and n2-1 df.
parameter is a calculated range within which it is deemed likely Null hypothesis, H0 : σ2=σ02; alternative H1: σ2>σ02. Confidence interval for a population mean
to fall. Given α, the set of intervals from infinitely repeated 2 If X has mean µ and variance σ2, with n>30 an approximate
Test statistic χcalc =(n-1)s2/ σ02 ~ χn2−1 .
random samples of size n will contain the parameter (100-α)% (100-α)% confidence interval for µ is x -tα/2 s/√n to x +tα/2 s/√n. If
of the time: each interval is a (100-α)% confidence interval. 2
Reject Ho if χcalc > χα2 , the critical value of χ2 with (n-1) df. X~N(µ,σ2) the interval is exact for all n.
Standard statistical distributions
Name/parameters Conditions/application pdf/pmf Mean Variance mgf Notes
Binomial n 
n independent success/fail trials
Bin(n,p) P ( X = x ) =   p x (1− p )n−x n
Positive integer n each with probability p of success.  x  np np(1–p) {1− p + pet } X ~Bin(n,p)⇒n– X ~Bin(n,1–p)
X = number of successes. x = 0,1,…, n
Probability p, 0≤p≤1
Geometric Repeated independent success/fail trials
each with probability p of success. P ( X = x ) = (1− p )x −1 p 1 1− p pet Has the “lack of memory” property
Geom(p) 2
X = number of trials up to and x = 1,2,… p p 1− (1− p )e t P( X > a+b | X > b) = P( X > a)
Probability p, 0≤p≤1 including the first success.
Poisson Events occur at random at a constant rate. λx
Po(λ) X = number of occurrences in some interval. λ is
P ( X = x ) = e−λ
x! λ λ {(
exp λ et − 1 )} Useful as approximation to Bin(n,p)
if n is large and p is small
λ a positive number the expected number of occurrences. x = 0,1,2,…
1  ( x − µ )2 
Normal A widely used distribution for symmetrically f (x) = exp    1 2 2 Can approximate Binomial, Poisson,
−  
N(µ,σ2) distributed random variables with mean µ σ 2π  2σ 2  µ σ2 exp 
µt + σ t 
 Pascal and Gamma distributions
  

 2 


µ, σ both real; σ>0. and standard deviation σ. (See Central Limit Theorem)
all real x
Exponential If events are occurring at rate θ per unit time. f ( x ) = θ exp {−θ x } 1 1 θ
, t <θ Has the “lack of memory” property
Expon( θ ) X = time to first occurrence. x>0 θ θ2 θ −t
P( X >a+b | X > b) = P( X > a)
Negative-binomial or
Pascal Repeated independent success/fail trials  x − 1 r
each with probability p of success. P ( X = x ) =   p (1− p )x −r r r (1− p )  pet r
Pasc(r,p)  r − 1   Pasc(1,p) ≡ Geom(p)
Positive integer r X = number of trials up to p p2 1− (1− p )et 
and including the r-th success. x = r, r+1, r+2, …  
Probability p, 0≤p≤1
Generalization of the exponential distribution; if α is α Ga(1,λ) ≡ Expon(λ)
Gamma β α α−1 −β x α  β α
an integer it represents the waiting time to the α-th f (x) = x e 
Ga(α,β) Γ(α ) β  , t <β If ν is an integer Ga(ν/2,2) is χν2 , the
occurrence of a random event where β is the β2  β − t 
α, β > 0 x>0 α >1
expected number of events. Chi-squared distribution with ν df

9 10 11

Das könnte Ihnen auch gefallen