Sie sind auf Seite 1von 12

December 3, 2010

The Kalman Filter Explained


Tristan Fletcher
www.cs.ucl.ac.uk/sta/T.Fletcher/
1 Introduction
The aim of this document is to derive the ltering equations for the simplest
Linear Dynamical System case, the Kalman Filter, outline the lters im-
plementation, do a similar thing for the smoothing equations and conclude
with parameter learning in an LDS (calibrating the Kalman Filter).
The document is based closely on Bishop [1] and Gharamanis [2] texts,
but is more suitable for those who wish to understand every aspect of the
mathematics required and see how it all comes together in a procedural
sense.
2 Model Specication
The simplest form of Linear Dynamical System (LDS) models a discrete time
process where a latent variable h is updated every time step by a constant
linear state transition A with the addition of zero-mean Gaussian noise
h
:
h
t
= Ah
t1
+
h
t
where
h
t
N(0,
H
)
p(h
t
|h
t1
) N(Ah
t1
,
H
) (2.1)
This latent variable is observed through a constant linear function of the
latent state B also subject to zero-mean Gaussian noise
v
:
v
t
= Bh
t
+
v
t
where
v
t
N(0,
V
)
p(v
t
|h
t
) N(Bh
t
,
V
) (2.2)
We wish to infer the probability distribution of h
t
given the observations up
to that point in time v
1:t
, i.e. p(h
t
|v
1:t
), which can be expressed recursively.
Starting with an initial distribution for our latent variable given the rst
observation:
p(h
1
|v
1
) p(v
1
|h
1
)p(h
1
)
and the assumption that h
1
has a Gaussian distribution:
p(h
1
) N(
0
,
2
0
)
1
values for p(h
t
|v
1:t
) for subsequent values of t can be found by iteration:
p(h
t
|v
1:t
) = p(h
t
|v
t
, v
1:t1
)
=
p(h
t
, v
t
|v
1:t1
)
p(v
t
|v
1:t1
)
p(h
t
, v
t
|v
1:t1
)
=
_
h
t1
p(h
t
, v
t
|v
1:t1
, h
t1
)p(h
t1
|v
1:t1
)
=
_
h
t1
p(v
t
|v
1:t1
, h
t1
, h
t
)p(h
t
|h
t1
, v
1:t1
)p(h
t1
|v
1:t1
)
h
t
v
1:t1
|h
t1

_
h
t1
p(v
t
|v
1:t1
, h
t1
, h
t
)p(h
t
|h
t1
)p(h
t1
|v
1:t1
)
v
t
h
t1
|v
1:t1
, h
t

_
h
t1
p(v
t
|v
1:t1
, h
t
)p(h
t
|h
t1
)p(h
t1
|v
1:t1
)
=
_
h
t1
p(v
t
|h
t
)p(h
t
|h
t1
)p(h
t1
|v
1:t1
) (2.3)
The fact that the distributions described in (2.1) and (2.2) are both Gaussian
and the operations in (2.3) of multiplication and then integration will yield
Gaussians when performed on Gaussians, means that we know p(h
t
|v
1:t
) will
itself be a Gaussian:
p(h
t
|v
1:t
) N(
t
,
2
t
) (2.4)
and the task is to derive the expressions for
t
and
2
t
.
3 Deriving the state estimate variance
If we dene h h E
_

h
_
, where h denotes the actual value of the latent
variable,

h is its estimated value, E
_

h
_
is the expected value of this estimate
and F as the covariance of the error estimate then:
F
t|t1
= E
_
h
t
h
T
t

= E
_
(Ah
t1
+
h
t
)(Ah
t1
+
h
t
)
T
_
= E
_
(Ah
t1
h
t1
T
A
T
+
h
t

hT
t
+
h
t
A
T
h
T
t1
+Ah
t1

hT
t
)
_
= AE
_
h
t1
h
T
t1

A
T
+E
_

h
t

hT
t
_
= AF
t1:t1
A
T
+
H
(3.1)
The subscript in F
t|t1
denotes the fact that this is Fs value before an
observation is made at time t (i.e. its a priori value) while F
t|t
would denote
2
a value for F
t|t
after an observation is made (its posterior value). This more
informative notation allows the update equation in (2.1) to be expressed as
follows:

h
t|t1
= A

h
t1|t1
(3.2)
Once we have an observation (and are therefore dealing with posterior val-
ues), we can dene
t
as the dierence between the observation wed expect
to see given our estimate of the latent state (its a priori value) and the one
actually observed, i.e.:

t
= v
t
B

h
t|t1
(3.3)
Now that we have an observation, if we wish to add a correction to our a
priori estimate that is proportional to the error
t
we can use a coecient
:

h
t|t
=

h
t|t1
+
t
(3.4)
This allows us to express F
t|t
recursively:
F
t|t
= Cov(h
t

h
t|t
)
= Cov(h
t
(

h
t|t1
+
t
)
= Cov(h
t
(

h
t|t1
+(v
t
B

h
t|t1
)))
= Cov(h
t
(

h
t|t1
+(Bh
t
+
v
t
B

h
t|t1
)))
= Cov(h
t

h
t|t1
Bh
t

v
t
+B

h
t|t1
)
= Cov((I B)(h
t

h
t|t1
)
v
t
)
= Cov((I B)(h
t

h
t|t1
)) +Cov(
v
t
)
= (I B)Cov(h
t

h
t|t1
)(I B)
T
+Cov(
v
t
)
T
= (I B)F
t|t1
(I B)
T
+
V

T
= (F
t|t1
BF
t|t1
)(I B)
T
+
V

T
= F
t|t1
BF
t|t1
F
t|t1
(B)
T
+BF
t|t1
(B)
T
+
V

T
= F
t|t1
BF
t|t1
F
t|t1
B
T

T
+(BF
t|t1
B
T
+
V
)
T
(3.5)
If we dene the innovation variance as S
t
= BF
t|t1
B
T
+
V
then (3.5)
becomes:
F
t|t
= F
t|t1
BF
t|t1
F
t|t1
B
T

T
+S
t

T
(3.6)
3
4 Minimizing the state estimate variance
If we wish to minimize the variance of F
t|t
, we can use the mean square error
measure (MSE):
E
_

h
t

h
t|t

2
_
= Tr(Cov(h
t

h
t|t
)) = Tr(F
t|t
) (4.1)
The only coecient we have control over is , so we wish to nd the that
gives us the minimum MSE, i.e. we need to nd such that:
Tr(F
t|t
)

= 0
(2.6)
Tr(F
t|t1
BF
t|t1
F
t|t1
B
T

T
+S
t

T
)

= 0
= F
t|t1
B
T
S
1
t
(4.2)
This optimum value for in terms of minimizing MSE is known as the
Kalman Gain and will be denoted K.
If we multiply by both sides of (4.2) by SK
T
:
KSK
T
= F
t|t1
B
T
K
T
(4.3)
Substituting this into (3.6):
F
t|t
= F
t|t1
KBF
t|t1
F
t|t1
B
T
K
T
+F
t|t1
B
T
K
T
= (I KB)F
t|t1
(4.4)
4
5 Filtered Latent State Estimation Procedure (The
Kalman Filter)
The procedure for estimating the state of h
t
, which when using the MSE
optimal value for is called Kalman Filtering, proceeds as follows:
1. Choose initial values for

h and F (i.e.

h
0|0
and F
0|0
).
2. Advance latent state estimate:

h
t|t1
= A

h
t1|t1
3. Advance estimate covariance:
F
t|t1
= AF
t1|t1
A
T
+
H
4. Make an observation v
t
5. Calculate innovation:

t
= v
t
B

h
t|t1
6. Calculate S
t
:
S
t
= BF
t|t1
B
T
+
V
7. Calculate K:
K = F
t|t1
B
T
S
1
t
8. Update latent state estimate:

h
t|t
=

h
t|t1
+K
t
9. Update estimate covariance (from (4.2)):
F
t|t
= (I KB)F
t|t1
10. Cycle through stages 2 to 9 for each time step.
Note that

h
t|t
and F
t|t
correspond to
t
and
2
t
from (2.4).
5
6 Smoothed Latent State Estimation
The smoothed probability of the latent variable is the probability it had a
value at time t after a sequence of T observations, i.e. p(h
t
|v
1:T
). Unlike the
Kalman Filter which you can update with each observation, one has to wait
until T observations have been made and then retrospectively calculate the
probability the latent variable had a value at time t where t < T.
Commencing at the nal time step in the sequence (t = T) and working
backwards to the start (t = 1), p(h
t
|v
1:T
) can be evaluated as follows:
p(h
t
|v
1:T
) =
_
h
t+1
p(h
t
|h
t+1
, v
1:T
)p(h
t+1
|v
1:T
)
h
t
v
t+1:T
|h
t+1
p(h
t
|h
t+1
, v
1:T
) = p(h
t
|h
t+1
, v
1:t
)
p(h
t
|v
1:T
) =
_
h
t+1
p(h
t
|h
t+1
, v
1:t
)p(h
t+1
|v
1:T
)
=
_
h
t+1
p(h
t+1
, h
t
|v
1:t
)p(h
t+1
|v
1:T
)
p(h
t+1
|v
1:t
)

_
h
t+1
p(h
t+1
, h
t
|v
1:t
)p(h
t+1
|v
1:T
)
=
_
h
t+1
p(h
t+1
|h
t
, v
1:t
)p(h
t
|v
1:t
)p(h
t+1
|v
1:T
)
h
t+1
v
1:t
|h
t

_
h
t+1
p(h
t+1
|h
t
)p(h
t
|v
1:t
)p(h
t+1
|v
1:T
) (6.1)
6
As before, we know that p(h
t
|v
1:T
) will be a Gaussian and we will need to
establish its mean and variance at each t, i.e. in a similar manned to (2.4):
p(h
t
|v
1:T
) N(h
s
t
, F
s
t
) (6.2)
Using the ltered values calculated in the previous section for

h
t|t
and F
t|t
for each time step, the procedure for estimating the smoothed parameters
h
s
t
and F
s
t
works backwards from the last time step in the sequence, i.e. at
t = T as follows:
1. Set h
s
T
and F
s
T
to

h
T|T
and F
T|T
from steps 8 and 9 in section 5.
2. Calculate A
s
t
:
A
s
t
= (AF
t|t
)
T
(AF
t|t
A
T
+
H
)
1
3. Calculate S
s
t
:
S
s
t
= F
t|t
A
s
t
AF
t|t
4. Calculate the smoothed latent variable estimate h
s
t
:
h
s
t
= A
s
t
h
s
t+1
+

h
t|t
A
s
t
A

h
t|t
5. Calculate the smoothed estimate covariance F
s
t
:
F
s
t
=
1
2
_
(A
s
t
F
s
t+1
A
T
+S
s
t
) + (A
s
t
F
s
t+1
A
T
+S
s
t
)
T

6. Calculate the smoothed cross-variance X


s
t
:
X
s
t
= A
s
t
F
s
t
+h
s
t

h
T
t|t
7. Cycle through stages 2 to 6 for each time step backwards through the
sequence from t = T to t = 1.
7
7 Expectation Maximization (Calibrating the Kalman
Filter)
The procedures outlined in the previous sections are ne if we assume that
we know the value in the parameter set =
_

0
,
2
0
, A,
H
, B,
V
_
but in
order to learn these values, we will need to perform the Expectation Maxi-
mization algorithm.
The joint probability of T time steps of the latent and observable variables
is:
p(h
1:T
, v
1:T
) = p(h
1
)
T

t=2
p(h
t
|h
t1
)
T

t=1
p(v
t
|h
t
) (7.1)
Making the dependence on the parameters explicit, the likelihood of the
model given the parameter set is:
p(h
1:T
, v
1:T
|) = p(h
1
|
0
,
2
0
)
T

t=2
p(h
t
|h
t1
, A,
H
)
T

t=1
p(v
t
|h
t
, B,
V
)
(7.2)
Taking logs gives us the models log likelihood:
ln p(h
1:T
, v
1:T
|) = ln p(h
1
|
0
,
2
0
) +
T

t=2
ln p(h
t
|h
t1
, A,
H
) +
T

t=1
ln p(v
t
|h
t
, B,
V
)
(7.3)
We will deal with each of the three components of (7.3) in turn. Using V
to represent the set of observations up to and including time t (i.e. v
1:t
),
H for h
1:T
,
old
to represent our parameter values before an iteration of the
EM loop, the superscript n to represent the value of a parameter after an
iteration of the loop, c to represent terms that are not dependent on
0
or

2
0
, to represent (
2
0
)
1
and Q = E
H|
old [ln p(H, V |)] we will rst nd the
expected value for p(h
1
|
0
,
2
0
):
Q =
1
2
ln

2
0

E
H|
old
_
1
2
(h
1

0
)
T
(h
1

0
)
_
+c
=
1
2
ln

2
0

1
2
E
H|
old
_
h
T
1
h
1
h
T
1

T
0
h
1
+
T
0

+c
=
1
2
_
ln || Tr
_
E
H|
old
_
h
1
h
T
1
h
1

T
0

0
h
T
1
+
0

T
0

__
+c
(7.4)
8
In order to nd the
0
which maximizes the expected log likelihood described
in (7.4), we will dierentiate it wrt
0
and set the dierential to zero:
Q

0
= 2
0
2E[h
1
] = 0

n
0
= E[h
1
] (7.5)
Proceeding in a similar manner to establish the maximal :
Q

=
1
2
_

2
0
E
_
h
1
h
T
1

E[h
1
]
T
0

0
E
_
h
T
1

+
0

T
0
_
= 0

2
0
n
= E
_
h
1
h
T
1

E[h
1
] E
_
h
T
1

(7.6)
In order to optimize for A and
H
we will substitute for p(h
t
|h
t1
, A,
H
)
in (7.3) giving:
Q =
T 1
2
ln |
H
| E
H|
old
_
1
2
T

t=2
(h
t
Ah
t1
)
T

1
H
(h
t
Ah
t1
)
_
+c
(7.7)
Maximizing with respect to these parameters then gives:
A
n
=
_
T

t=2
E
_
h
t
h
T
t1

__
T

t=2
E
_
h
t
h
T
t1

_
1
(7.8)

n
H
=
1
T 1
T

t=2
_
E
_
h
t
h
T
t1

A
n
E
_
h
t1
h
T
t

E
_
h
t
h
T
t1

A
n
+A
n
E
_
h
t1
h
T
t1

(A
n
)
T
_
(7.9)
In order to determine values for B and
V
we substitute for p(v
t
|h
t
, B,
V
)
in (7.3) to give:
Q =
T
2
ln |
V
| E
H|
old
_
1
2
T

t=2
(v
t
Bh
t
)
T

1
V
(v
t
Bh
t
)
_
+c
(7.10)
Maximizing this with respect to B and
V
gives:
B
n
=
_
T

t=1
v
t
E
_
h
T
t

__
T

t=1
E
_
h
t
h
T
t

_
1
(7.11)

n
V
=
1
T
T

t=1
_
v
t
v
T
t
B
n
E[h
t
] v
T
t
v
t
E
_
h
T
t

B
n
+B
n
E
_
h
t
h
T
t

B
n
_
(7.12)
9
Using the values calculated from the smoothing procedure in section 5:
E[h
t
] = h
s
t
E
_
h
t
h
T
t

= F
s
t
E
_
h
t
h
T
t1

= X
s
t
We can now set out the procedure for parameter learning using Expectation
Maximization:
1. Choose starting values for the parameters =
_

0
,
2
0
, A,
H
, B,
V
_
.
2. Using the parameter set , calculate the ltered statistics

h
t|t
and F
t|t
for each time step as described in section 4.
3. Using the parameter set , calculate the smoothed statistics h
s
t
, F
s
t
and X
s
t
for each time step as described in section 5.
4. Update A:
A
n
=
_
T

t=1
h
s
t
h
s
t1
T
+
T

t=1
X
s
t
__
T

t=1
h
s
t
h
s
t
T
+
T

t=1
F
s
t
_
1
5. Update
H
:

n
H
= [T 1]
1
_
_
T

t=2
h
s
t
h
s
t
T
+
T

t=2
F
s
t
A
n
_
T

t=1
h
s
t
h
s
t1
T
+
T

t=1
X
s
t
_
T
_
_
6. Update B:
B
n
=
_
T

t=1
v
t
h
s
t
T
__
T

t=1
h
s
t
h
s
t
T
+
T

t=1
F
s
t
_
1
7. Update
V
:

n
V
=
_
_
T

t=1
v
t
v
T
t
B
n
_
T

t=1
v
t
h
s
t
T
_
T
_
_
T
1
8. Update
0
:

n
0
= h
s
1
9. Update
2
0
:

2
0
n
=
_
F
s
1
+h
s
1
h
s
1
T
_
1
n
0

n
0
T

1
10. Iterate steps 2 to 10 a given number of times or until the dierence
between parameter values from succeeding iterations is below a pre-
dened threshold.
10
References
[1] C. M. Bishop, Pattern Recognition and Machine Learning (Information
Science and Statistics). Springer, 2006.
[2] Z. Ghahramani and G. Hinton, Parameter estimation for linear dynam-
ical systems, Tech. Rep., 1996.
11

Das könnte Ihnen auch gefallen