Beruflich Dokumente
Kultur Dokumente
Cookbook
Version 0.2.4
14th May, 2017
http://statistics.zone/
Copyright
c Matthias Vallentin, 2017
Contents 14 Exponential Family 16 21.5 Spectral Analysis . . . . . . . . . . . . . 28
i=0
i! x!
1 We use the notation (s, x) and (x) to refer to the Gamma functions (see 22.1), and use B(x, y) and Ix to refer to the Beta functions (see 22.2).
3
Uniform (discrete) Binomial Geometric Poisson
n = 40, p = 0.3 0.8 p = 0.2
=1
n = 30, p = 0.6 p = 0.5 =4
n = 25, p = 0.9 p = 0.8 = 10
0.3
0.2 0.6
0.2
PMF
PMF
PMF
PMF
1
0.4
n
0.1
0.1
0.2
0.0
0.0
0.0
CDF
CDF
CDF
0.50 0.6 0.50
i
n
0.25 0.4 0.25
n = 40, p = 0.3 p = 0.2
=1
n = 30, p = 0.6 p = 0.5
=4
0 0.00
n = 25, p = 0.9 0.2 p = 0.8 0.00
= 10
4
1.2 Continuous Distributions
Notation FX (x) fX (x) E [X] V [X] MX (s)
0 x<a
(b a)2 esb esa
xa I(a < x < b) a+b
Uniform Unif (a, b) a<x<b
ba ba 2 12 s(b a)
1 x>b
(x )2
Z x
2 s2
1
N , 2 2
Normal (x) = (t) dt (x) = exp exp s +
2 2 2 2
(ln x )2
1 1 ln x 1 2 2 2
ln N , 2 e+ /2
(e 1)e2+
Log-Normal + erf exp
2 2 2 2 x 2 2 2 2
1 T
1 (x) 1
Multivariate Normal MVN (, ) (2)k/2 ||1/2 e 2 (x) exp T s + sT s
2
(+1)/2 (
+1
2 x2 2
>2
Students t Student() Ix , 1 + 0 >1
2
2 2 1<2
1 k x 1
Chi-square 2k , xk/21 ex/2 k 2k (1 2s)k/2 s < 1/2
(k/2) 2 2 2k/2 (k/2)
r
d
(d1 x)d1 d2 2
2d22 (d1 + d2 2)
d1 d1 (d1 x+d2 )d1 +d2 d2
F F(d1 , d2 ) I d1 x , d1 d1 d2 2 d1 (d2 2)2 (d2 4)
d1 x+d2 2 2 xB 2
, 2
1 x/ 1
Exponential Exp () 1 ex/ e 2 s (s < )
1
!
(, x) 1 x 1
Gamma Gamma (, ) x e s (s < )
() () 2 1
, x
1 /x 2 2(s)/2 p
Inverse Gamma InvGamma (, ) x e >1 >2 K 4s
() () 1 ( 1)2 ( 2) ()
P
k
i=1 i Y 1
k
i E [Xi ] (1 E [Xi ])
Dirichlet Dir () Qk xi i Pk Pk
i=1 (i ) i=1 i=1 i i=1 i + 1
k1
!
( + ) 1 X Y +r sk
Beta Beta (, ) Ix (, ) x (1 x)1 1+
() () + ( + )2 ( + + 1) r=0
++r k!
k=1
sn n
k k x k1 (x/)k 1 2 X n
Weibull Weibull(, k) 1 e(x/) e 1 + 2 1 + 2 1+
k k n=0
n! k
x
m x xm x2m
Pareto Pareto(xm , ) 1 x xm m
+1 x xm >1 >2 (xm s) (, xm s) s < 0
x x 1 ( 1)2 ( 2)
1
We use the rate parameterization where =
. Some textbooks use as scale parameter instead [6].
5
Uniform (continuous) Normal LogNormal Student's t
2.0 1.00 0.4 =1
= 0, = 0.2
2
= 0, = 3
2
= 0, 2 = 1 = 2, 2 = 2 =2
= 0, 2 = 5 = 0, 2 = 1 =5
=
= 2, 2 = 0.5 = 0.5, 2 = 1
= 0.25, 2 = 1
1.5 0.75 = 0.125, 2 = 1 0.3
PDF
PDF
1
1.0 0.50 0.2
ba
a b 5.0 2.5 0.0 2.5 5.0 0 1 2 3 5.0 2.5 0.0 2.5 5.0
x x x x
2 F Exponential Gamma
d1 = 1, d2 = 1 2.0 = 0.5 0.5 = 1, = 0.5
1.00 k=1 3 d1 = 2, d2 = 1 =1 = 2, = 0.5
k=2 d1 = 5, d2 = 2 = 2.5 = 3, = 0.5
k=3 d1 = 100, d2 = 1 = 5, = 1
k=4 d1 = 100, d2 = 100 0.4 = 9, = 2
k=5
1.5
0.75
2
0.3
PDF
PDF
PDF
0.50 1.0
0.2
1
0.25 0.5
0.1
0 2 4 6 8 0 1 2 3 4 5 0 1 2 3 4 5 0 5 10 15 20
x x x x
Inverse Gamma Beta Weibull Pareto
= 1, = 1 5 = 0.5, = 0.5 2.0 = 1, k = 0.5 4 xm = 1, k = 1
= 2, = 1 = 5, = 1 = 1, k = 1 xm = 1, k = 2
= 3, = 1 = 1, = 3 = 1, k = 1.5 xm = 1, k = 4
4 = 3, = 0.5 = 2, = 2 = 1, k = 5
4 = 2, = 5
1.5 3
3
3
PDF
PDF
1.0 2
2
2
0.5 1
1 1
0 0 0.0 0
0 1 2 3 4 5 0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0 1.5 2.0 2.5 1.0 1.5 2.0 2.5
x x x x
6
Uniform (continuous) Normal LogNormal Student's t
1 1.00 = 0, = 3
2 1.00
= 2, 2 = 2
0.75 = 0, 2 = 1
= 0.5, 2 = 1
= 0.25, 2 = 1
0.75 = 0.125, 2 = 1 0.75
0.50
CDF
CDF
CDF
CDF
0.50 0.50
0.25
0.25 0.25
= 0, 2 = 0.2 =1
= 0, 2 = 1 =2
= 0, 2 = 5 =5
0 0.00 = 2, 2 = 0.5 0.00 0.00 =
a b 5.0 2.5 0.0 2.5 5.0 0 1 2 3 5.0 2.5 0.0 2.5 5.0
x x x x
2 F Exponential Gamma
1.00 1.00 1.00
1.00
CDF
CDF
CDF
0 2 4 6 8 0 1 2 3 4 5 0 1 2 3 4 5 0 5 10 15 20
x x x x
Inverse Gamma Beta Weibull Pareto
1.00 1.00
1.00 1.00 = 0.5, = 0.5
= 5, = 1
= 1, = 3
= 2, = 2
= 2, = 5
0.75 0.75 0.75 0.75
CDF
CDF
CDF
CDF
0.50 0.50 0.50 0.50
= 1, = 1 = 1, k = 0.5
= 2, = 1 = 1, k = 1 xm = 1, k = 1
= 3, = 1 = 1, k = 1.5 xm = 1, k = 2
0.00 = 3, = 0.5 0.00 0.00 = 1, k = 5 0.00 xm = 1, k = 4
0 1 2 3 4 5 0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0 1.5 2.0 2.5 1.0 1.5 2.0 2.5
x x x x
7
2 Probability Theory Law of Total Probability
n n
Definitions X G
P [B] = P [B|Ai ] P [Ai ] = Ai
Sample space i=1 i=1
2. P [] = 1
" #
G
X 3 Random Variables
3. P Ai = P [Ai ]
i=1 i=1 Random Variable (RV)
Probability space (, A, P) X:R
Independence
f (x, y)
fY |X (y | x) =
A
B P [A B] = P [A] P [B] fX (x)
Conditional Probability Independence
P [A B] 1. P [X x, Y y] = P [X x] P [Y y]
P [A | B] = P [B] > 0 2. fX,Y (x, y) = fX (x)fY (y)
P [B] 8
Z
3.1 Transformations E [XY ] = xyfX,Y (x, y) dFX (x) dFY (y)
X,Y
Transformation function
E [(Y )] 6= (E [X]) (cf. Jensen inequality)
Z = (X)
P [X Y ] = 1 = E [X] E [Y ]
Discrete P [X = Y ] = 1 = E [X] = E [Y ]
X
fZ (z) = P [(X) = z] = P [{x : (x) = z}] = P X 1 (z) =
fX (x)
X
E [X] = P [X x] X discrete
x1 (z) x=1
Cauchy-Schwarz
2 Exponential
E [XY ] E X 2 E Y 2
n
X
Markov Xi Exp () Xi
Xj = Xi Gamma (n, )
E [(X)]
P [(X) t] i=1
t Memoryless property: P [X > x + y | X > y] = P [X > x]
Chebyshev
V [X] Normal
P [|X E [X]| t]
t2
X
Chernoff X N , 2 = N (0, 1)
e
X N , Z = aX + b = Z N a + b, a2 2
2
P [X (1 + )] > 1
(1 + )1+
Xi N i , i2 Xi Xj =
P
Xi N
P
i , i i2
P
i i
Hoeffding
P [a < X b] = b a
X1 , . . . , Xn independent P [Xi [ai , bi ]] = 1 1 i n (x) = 1 (x) 0 (x) = x(x) 00 (x) = (x2 1)(x)
1
E X t e2nt2 t > 0
Upper quantile of N (0, 1): z = (1 )
P X
2 2
Gamma
E X | t 2 exp Pn 2n t
P |X 2
t>0
i=1 (bi ai ) X Gamma (, ) X/ Gamma (, 1)
P
Jensen Gamma (, ) i=1 Exp ()
P P
E [(X)] (E [X]) convex Xi Gamma (i , ) Xi
Xj = i Xi Gamma ( i i , )
10
Z
() 9.2 Bivariate Normal
= x1 ex dx
0
Let X N x , x2 and Y N y , y2 .
Beta
1 ( + ) 1 1 z
x1 (1 x)1 = x (1 x)1 f (x, y) = exp
2(1 2 )
p
B(, ) ()() 2x y 1 2
B( + k, ) +k1
E X k1
" #
E Xk =
2 2
=
B(, ) ++k1 x x y y x x y y
z= + 2
Beta (1, 1) Unif (0, 1) x y x y
Conditional mean and variance
8 Probability and Moment Generating Functions E [X | Y ] = E [X] +
X
(Y E [Y ])
Y
GX (t) = E tX |t| < 1 p
V [X | Y ] = X 1 2
" #
X (Xt)i X E Xi
ti
MX (t) = GX (et ) = E eXt = E =
i=0
i! i=0
i!
9.3 Multivariate Normal
P [X = 0] = GX (0)
P [X = 1] = G0X (0) Covariance matrix (Precision matrix 1 )
(i)
GX (0)
P [X = i] = V [X1 ] Cov [X1 , Xk ]
i! .. .. ..
=
E [X] = G0X (1 ) . . .
(k)
E X k = MX (0) Cov [Xk , X1 ] V [Xk ]
X! (k) If X N (, ),
E = GX (1 )
(X k)!
2 1
V [X] = G00X (1 ) + G0X (1 ) (G0X (1 )) fX (x) = (2) n/2
||
1/2
exp (x )T 1 (x )
d 2
GX (t) = GY (t) = X = Y
Properties
9 Multivariate Distributions Z N (0, 1) X = + 1/2 Z = X N (, )
X N (, ) = 1/2 (X ) N (0, 1)
9.1 Standard Bivariate Normal X N (, ) = AX N A, AAT
p
Let X, Y N (0, 1) X
Z where Y = X + 1 2 Z X N (, ) kak = k = aT X N aT , aT a
Joint density
1 x2 + y 2 2xy
10 Convergence
f (x, y) = exp
2(1 2 )
p
2 1 2 Let {X1 , X2 , . . .} be a sequence of rvs and let X be another rv. Let Fn denote
Conditionals the cdf of Xn and let F denote the cdf of X.
Types of Convergence
(Y | X = x) N x, 1 2 (X | Y = y) N y, 1 2
and D
1. In distribution (weakly, in law): Xn X
Independence
X
Y = 0 lim Fn (t) = F (t) t where F continuous
n 11
P
2. In probability: Xn X
Xn n(Xn ) D
Zn := q = Z where Z N (0, 1)
( > 0) lim P [|Xn X| > ] = 0 n
n V X
as
3. Almost surely (strongly): Xn X lim P [Zn z] = (z) zR
n
h i h i
P lim Xn = X = P : lim Xn () = X() = 1 CLT notations
n n
qm
Zn N (0, 1)
4. In quadratic mean (L2 ): Xn X 2
X n N ,
lim E (Xn X)2 = 0 n
n 2
X n N 0,
Relationships n
2
qm P D n(Xn ) N 0,
Xn X = Xn X = Xn X
as
Xn X = Xn X
P n(Xn )
N (0, 1)
D P
Xn X (c R) P [X = c] = 1 = Xn X
P P P
Xn X Yn Y = Xn + Yn X + Y
qm qm qm
Xn X Yn Y = Xn + Yn X + Y Continuity correction
P P P
Xn X Yn Y = Xn Yn XY
x + 12
P P
Xn X = (Xn ) (X)
P Xn x
D
Xn X = (Xn ) (X)
D / n
qm
Xn b limn E [Xn ] = b limn V [Xn ] = 0
x 12
n
qm
n x 1
X1 , . . . , Xn iid E [X] = V [X] < X P X
/ n
Slutzkys Theorem Delta method
D P D
Xn X and Yn c = Xn + Yn X + c
2
2 2
D P D
Xn X and Yn c = Xn Yn cX Yn N , = (Yn ) N (), (0 ())
n n
D D D
In general: Xn X and Yn Y =
6 Xn + Yn X + Y
11 Statistical Inference
10.1 Law of Large Numbers (LLN)
iid
Let X1 , , Xn F if not otherwise noted.
Let {X1 , . . . , Xn } be a sequence of iid rvs, E [X1 ] = .
Weak (WLLN)
n
X
P
n 11.1 Point Estimation
Strong (SLLN) Point estimator bn of is a rv: bn = g(X1 , . . . , Xn )
h i
n
X
as
n bias(bn ) = E bn
P
Consistency: bn
10.2 Central Limit Theorem (CLT)
Sampling distribution: F (bn )
Let {X1 , . . . , Xn } be a sequence of iid rvs, E [X1 ] = , and V [X1 ] = 2 .
r h i
Standard error: se(n ) = V bn
b
12
h i h i
Mean squared error: mse = E (bn )2 = bias(bn )2 + V bn 11.4 Statistical Functionals
limn bias(bn ) = 0 limn se(bn ) = 0 = bn is consistent Statistical functional: T (F )
bn D Plug-in estimator of = (F ): bn = T (Fbn )
Asymptotic normality: N (0, 1) R
se Linear functional: T (F ) = (x) dFX (x)
Slutzkys Theorem often lets us replace se(bn ) by some (weakly) consis- Plug-in estimator for linear functional:
tent estimator
bn . Z n
1X
T (Fbn ) = (x) dFbn (x) = (Xi )
11.2 Normal-Based Confidence Interval n i=1
b 2 . Let z/2 = 1 (1 (/2)), i.e., P Z > z/2 = /2
Suppose bn N , se
b 2 = T (Fbn ) z/2 se
Often: T (Fbn ) N T (F ), se b
and P z/2 < Z < z/2 = 1 where Z N (0, 1). Then
pth quantile: F 1 (p) = inf{x : F (x) p}
Cn = bn z/2 se
b b=X n
n
1 X n )2
b2 =
(Xi X
11.3 Empirical distribution n 1 i=1
1
Pn
Empirical Distribution Function (ECDF) n i=1 (Xi b)3
b=
Pn
I(Xi x) Pb3
Fn (x) = i=1
b n
i=1 (Xi Xn )(Yi Yn )
n b = qP qP
n 2 n 2
i=1 (Xi Xn ) i=1 (Yi Yn )
(
1 Xi x
I(Xi x) =
0 Xi > x
Properties (for any fixed x) 12 Parametric Inference
h i
E Fbn = F (x)
Let F = f (x; ) : be a parametric model with parameter space Rk
h i F (x)(1 F (x)) and parameter = (1 , . . . , k ).
V Fbn =
n
F (x)(1 F (x)) D 12.1 Method of Moments
mse = 0
n
P j th moment
Fbn F (x) Z
j () = E X j = xj dFX (x)
Dvoretzky-Kiefer-Wolfowitz (DKW) inequality (X1 , . . . , Xn F )
P sup F (x) Fn (x) > = 2e2n
b 2
j th sample moment
x n
1X j
Nonparametric 1 confidence band for F
bj = X
n i=1 i
L(x) = max{Fbn n , 0}
Method of Moments estimator (MoM)
U (x) = min{Fbn + n , 1}
s 1 () =
b1
1 2
= log 2 () =
b2
2n
.. ..
.=.
P [L(x) F (x) U (x) x] 1 k () =
bk
13
Properties of the MoM estimator Equivariance: bn is the mle = (bn ) is the mle of ()
bn exists with probability tending to 1 Asymptotic optimality (or efficiency), i.e., smallest variance for large sam-
P
Consistency: bn ples. If en is any other estimator, the asymptotic relative efficiency is:
p
Asymptotic normality: 1. se 1/In ()
(bn ) D
D
n(b ) N (0, ) N (0, 1)
se
q
where = gE Y Y T g T , Y = (X, X 2 , . . . , X k )T , b 1/In (bn )
2. se
1
g = (g1 , . . . , gk ) and gj = j ()
(bn ) D
N (0, 1)
se
b
12.2 Maximum Likelihood Asymptotic optimality
Likelihood: Ln : [0, ) h i
V bn
n
Y are(en , bn ) = h i 1
Ln () = f (Xi ; ) V en
i=1
Approximately the Bayes estimator
Log-likelihood
n
X 12.2.1 Delta Method
`n () = log Ln () = log f (Xi ; )
i=1 b where is differentiable and 0 () 6= 0:
If = ()
Maximum likelihood estimator (mle)
n ) D
(b
N (0, 1)
Ln (bn ) = sup Ln () se(b
b )
where b = ()
b is the mle of and
Score function
s(X; ) = log f (X; ) b = 0 ()
se se(
b n )
b b
Fisher information
I() = V [s(X; )] 12.3 Multiparameter Models
In () = nI() Let = (1 , . . . , k ) and b = (b1 , . . . , bk ) be the mle.
Fisher information (exponential family)
2 `n 2 `n
Hjj = Hjk =
2 j k
I() = E s(X; )
Fisher information matrix
Observed Fisher information
E [H11 ] E [H1k ]
n
In () = .. .. ..
2 X
. . .
Inobs () =
log f (Xi ; )
2 i=1 E [Hk1 ] E [Hkk ]
H0 : 0 versus H1 : 1
Likelihood ratio test
Definitions
The approximate size LRT rejects H0 when (X) 2k1, Bayes Theorem
Pearson Chi-square Test f (x | )f () f (x | )f ()
f ( | x) = =R Ln ()f ()
k f (xn ) f (x | )f () d
X (Xj E [Xj ])2
T = where E [Xj ] = np0j under H0
j=1
E [Xj ] Definitions
D
T 2k1 X n = (X1 , . . . , Xn )
p-value = P 2k1 > T (x) xn = (x1 , . . . , xn )
D
2
Faster Xk1 than LRT, hence preferable for small n Prior density f ()
Likelihood f (xn | ): joint density of the data
Independence testing Yn
In particular, X n iid = f (xn | ) = f (xi | ) = Ln ()
I rows, J columns, X multinomial sample of size n = I J i=1
X
mles unconstrained: pbij = nij Posterior density f ( | xn )
X
Normalizing constant cn = f (xn ) = f (x | )f () d
R
mles under H0 : pb0ij = pbi pbj = Xni nj
Kernel: part of a density that dependsRon
PI PJ nX
LRT: = 2 i=1 j=1 Xij log Xi Xijj L ()f ()d
Posterior mean n = f ( | xn ) d = R n
R
PI PJ (X E[X ])2 Ln ()f () d
PearsonChiSq: T = i=1 j=1 ijE[Xij ]ij
D
LRT and Pearson 2k , where = (I 1)(J 1) 15.1 Credible Intervals
Posterior interval
14 Exponential Family Z b
P [ (a, b) | xn ] = f ( | xn ) d = 1
Scalar parameter a
Types
Flat: f () constant
R
Proper: f () d = 1
R
Improper: f () d =
Jeffreys Prior (transformation-invariant):
p p
f () I() f () det(I())
Under the assumption of Normality, the least squares estimator is also the mle
Estimate regression function
but the least squares variance estimator is not the mle.
n k
1X 2 X
b2 =
rb(x) = bj xj
n i=1 i j=1
Training error
n
R
btr (S) =
X
(Ybi (S) Yi )2
19 Non-parametric Function Estimation
i=1
2 19.1 Density Estimation
R Pn b 2
R i=1 (Yi (S) Y )
rss(S) btr (S) R
R2 (S) = 1 =1 =1 Estimate f (x), where f (x) = P [X A] = A
f (x) dx.
P n 2
i=1 (Yi Y )
tss tss Integrated square error (ise)
The training error is a downward-biased estimate of the prediction risk. Z 2 Z
h i L(f, fbn ) = f (x) fn (x) dx = J(h) + f 2 (x) dx
b
E R btr (S) < R(S)
h i n
X h i Frequentist risk
bias(Rtr (S)) = E Rtr (S) R(S) = 2
b b Cov Ybi , Yi
i=1
h i Z Z
R(f, fbn ) = E L(f, fbn ) = b2 (x) dx + v(x) dx
Adjusted R2
n 1 rss
R2 (S) = 1
n k tss h i
Mallows Cp statistic b(x) = E fbn (x) f (x)
h i
R(S)
b =R 2 = lack of fit + complexity penalty
btr (S) + 2kb v(x) = V fbn (x)
22
19.1.1 Histograms KDE
n
Definitions 1X1 x Xi
fbn (x) = K
n i=1 h h
Number of bins m
Z Z
1 4 00 2 1
1 R(f, fn ) (hK )
b (f (x)) dx + K 2 (x) dx
Binwidth h = m 4 nh
Bin Bj has j observations c
2/5 1/5 1/5
c2 c3
Z Z
h = 1 c = 2
, c = K 2
(x) dx, c = (f 00 (x))2 dx
R
Define pbj = j /n and pj = Bj f (u) du n1/5
1 K 2 3
Z 4/5 Z 1/5
c4 5 2 2/5 2 00 2
Histogram estimator R (f, fn ) = 4/5
b c4 = (K ) K (x) dx (f ) dx
n 4
| {z }
m C(K)
X pbj
fbn (x) = I(x Bj )
j=1
h Epanechnikov Kernel
h i pj
E fbn (x) = (
3
h
4 5(1x2 /5)
|x| < 5
h i p (1 p ) K(x) =
j j
V fbn (x) = 0 otherwise
nh2
h2
Z
2 1
R(fbn , f ) (f 0 (u)) du + Cross-validation estimate of E [J(h)]
12 nh
!1/3
1 6 n n n
1 X X Xi Xj
Z
h = 1/3 R 2Xb 2
2 du JbCV (h) = fbn2 (x) dx f(i) (Xi ) K + K(0)
n (f 0 (u)) n i=1 hn2 i=1 j=1 h nh
2/3 Z 1/3
b C 3 0 2
R (fn , f ) 2/3 C= (f (u)) du
n 4 Z
K (x) = K (2) (x) 2K(x) K (2) (x) = K(x y)K(y) dy
Cross-validation estimate of E [J(h)]
Z
2Xb
n
2 n+1 X 2
m 19.2 Non-parametric Regression
JbCV (h) = fbn2 (x) dx f(i) (Xi ) = pb
n i=1 (n 1)h (n 1)h j=1 j Estimate f (x) where f (x) = E [Y | X = x]. Consider pairs of points
(x1 , Y1 ), . . . , (xn , Yn ) related by
Yi = r(xi ) + i
19.1.2 Kernel Density Estimator (KDE)
E [i ] = 0
Kernel K V [i ] = 2
Random walk
21 Time Series
Drift
Pt
Mean function Z
xt = t + j=1 wj
xt = E [xt ] = xft (x) dx E [xt ] = t
Weakly stationary nh
1 X
b(h) = (xt+h x
)(xt x
)
E x2t < t Z n t=1
2
E xt = m t Z
x (s, t) = x (s + r, t + r) r, s, t Z Sample autocorrelation function
Autocovariance function
b(h)
b(h) =
(h) = E [(xt+h )(xt )] h Z
b(0)
(0) = E (xt )2
(0) 0 Sample cross-variance function
(0) |(h)|
nh
(h) = (h) 1 X
bxy (h) = (xt+h x
)(yt y)
n t=1
Autocorrelation function (ACF)
21.3 Non-Stationary Time Series
X
2
(h) = w j+h j Classical decomposition model
j=
xt = t + st + wt
21.2 Estimation of Correlation
Sample mean t = trend
n
1X st = seasonal component
x
= xt
n t=1 wt = random noise term
26
21.3.1 Detrending Moving average polynomial
Least squares (z) = 1 + 1 z + + q zq z C q 6= 0
2
1. Choose trend model, e.g., t = 0 + 1 t + 2 t
Moving average operator
2. Minimize rss to obtain trend estimate bt = b0 + b1 t + b2 t2
3. Residuals , noise wt (B) = 1 + 1 B + + p B p
Moving average MA (q) (moving average model order q)
1
The low-pass filter vt is a symmetric moving average mt with aj = 2k+1 : xt = wt + 1 wt1 + + q wtq xt = (B)wt
k q
1 X X
vt = xt1 E [xt ] = j E [wtj ] = 0
2k + 1
i=k j=0
Pk ( Pqh
1 2
If 2k+1 i=k wtj 0, a linear trend function t = 0 + 1 t passes
w j=0 j j+h 0hq
(h) = Cov [xt+h , xt ] =
without distortion 0 h>q
Differencing MA (1)
xt = wt + wt1
t = 0 + 1 t = xt = 1
2 2
(1 + )w h = 0
2
21.4 ARIMA models (h) = w h=1
0 h>1
Autoregressive polynomial
(
(z) = 1 1 z p zp z C p 6= 0 2 h=1
(h) = (1+ )
0 h>1
Autoregressive operator
ARMA (p, q)
(B) = 1 1 B p B p
xt = 1 xt1 + + p xtp + wt + 1 wt1 + + q wtq
Autoregressive model order p, AR (p)
(B)xt = (B)wt
xt = 1 xt1 + + p xtp + wt (B)xt = wt
Partial autocorrelation function (PACF)
AR (1) xih1 , regression of xi on {xh1 , xh2 , . . . , x1 }
k1 hh = corr(xh xh1
h , x0 xh1
0 ) h2
X k,||<1 X
xt = k (xtk ) + j (wtj ) = j (wtj ) E.g., 11 = corr(x1 , x0 ) = (1)
j=0 j=0
| {z } ARIMA (p, d, q)
linear process
P j
d xt = (1 B)d xt is ARMA (p, q)
E [xt ] = j=0 (E [wtj ]) = 0
2 h
w (B)(1 B)d xt = (B)wt
(h) = Cov [xt+h , xt ] = 12
(h) Exponentially Weighted Moving Average (EWMA)
(h) = (0) = h
(h) = (h 1) h = 1, 2, . . . xt = xt1 + wt wt1
27
X Frequency index (cycles per unit time), period 1/
xt = (1 )j1 xtj + wt when || < 1
j=1
Amplitude A
Phase
n+1 = (1 )xn +
x xn
U1 = A cos and U2 = A sin often normally distributed rvs
Seasonal ARIMA
Periodic mixture
Denoted by ARIMA (p, d, q) (P, D, Q)s
q
P (B s )(B)D d s
s xt = + Q (B )(B)wt X
xt = (Uk1 cos(2k t) + Uk2 sin(2k t))
k=1
21.4.1 Causality and Invertibility
P Uk1 , Uk2 , for k = 1, . . . , q, are independent zero-mean rvs with variances k2
ARMA (p, q) is causal (future-independent) {j } : j=0 j < such that Pq
(h) = k=1 k2 cos(2k h)
Pq
X (0) = E x2t = k=1 k2
xt = wtj = (B)wt
j=0 Spectral representation of a periodic process
P
ARMA (p, q) is invertible {j } : j=0 j < such that (h) = 2 cos(20 h)
2 2i0 h 2 2i0 h
X = e + e
(B)xt = Xtj = wt 2 2
Z 1/2
j=0
= e2ih dF ()
Properties 1/2
ARMA (p, q) causal roots of (z) lie outside the unit circle Spectral distribution function
X (z)
j 0
< 0
(z) = j z = |z| 1
(z) F () = 2 /2 < 0
j=0
2
0
ARMA (p, q) invertible roots of (z) lie outside the unit circle
F () = F (1/2) = 0
X (z) F () = F (1/2) = (0)
(z) = j z j = |z| 1
j=0
(z)
Spectral density
Behavior of the ACF and PACF for causal and invertible ARMA models
X 1 1
AR (p) MA (q) ARMA (p, q) f () = (h)e2ih
2 2
h=
ACF tails off cuts off after lag q tails off
PACF cuts off after lag p tails off q tails off P R 1/2
Needs h= |(h)| < = (h) = 1/2
e2ih f () d h = 0, 1, . . .
21.5 Spectral Analysis f () 0
f () = f ()
Periodic process f () = f (1 )
R 1/2
xt = A cos(2t + ) (0) = V [xt ] = 1/2 f () d
2
= U1 cos(2t) + U2 sin(2t) White noise: fw () = w
28
ARMA (p, q) , (B)xt = (B)wt : 22.2 Beta Function
Z 1
(x)(y)
|(e2i )|2
2 Ordinary: B(x, y) = B(y, x) = tx1 (1 t)y1 dt =
fx () = w 0 (x + y)
|(e2i )|2 Z x
a1 b1
Pp Pq Incomplete: B(x; a, b) = t (1 t) dt
where (z) = 1 k=1 k z k and (z) = 1 + k=1 k z k 0
Regularized incomplete:
Discrete Fourier Transform (DFT) a+b1
B(x; a, b) a,bN X (a + b 1)!
Ix (a, b) = = xj (1 x)a+b1j
n
X B(a, b) j=a
j!(a + b 1 j)!
d(j ) = n1/2 xt e2ij t
I0 (a, b) = 0 I1 (a, b) = 1
i=1
Ix (a, b) = 1 I1x (b, a)
Fourier/Fundamental frequencies
22.3 Series
j = j/n
Finite Binomial
Inverse DFT n n
n1 X n(n + 1) X n
= 2n
X
xt = n 1/2
d(j )e 2ij t k=
2 k
j=0 k=1 k=0
n n
X X r+k r+n+1
Periodogram (2k 1) = n2 =
I(j/n) = |d(j/n)|2 k n
k=1 k=0
n n
Scaled Periodogram
X n(n + 1)(2n + 1) X k n+1
k2 = =
6 m m+1
k=1 k=0
4 n
P (j/n) = I(j/n) X
n(n + 1)
2 Vandermondes Identity:
n k3 = r
m n
m+n
2
!2 !2 X
n n k=1 =
2X 2X n k rk r
= xt cos(2tj/n + xt sin(2tj/n cn+1 1 k=0
n t=1 n t=1
X
ck = c 6= 1 Binomial Theorem:
c1 n
n nk k
k=0
X
a b = (a + b)n
22 Math k
k=0
Partitions
n
X
Pn+k,k = Pn,i k > n : Pn,k = 0 n 1 : Pn,0 = 0, P0,0 = 1
i=1
References
[1] P. G. Hoel, S. C. Port, and C. J. Stone. Introduction to Probability Theory. Brooks Cole,
1972.
[2] L. M. Leemis and J. T. McQueston. Univariate Distribution Relationships. The American
Statistician, 62(1):4553, 2008.
30
Univariate distribution relationships, courtesy Leemis and McQueston [2].
31