Beruflich Dokumente
Kultur Dokumente
= {y
k
,
k
,
k1
}, TCP comm. protocol
G
k
= {y
k
,
k
}, UDP comm. protocol
(3)
where y
k
= (y
k
, y
k1
, . . . , y
1
),
k
= (
k
,
k1
, . . . ,
1
),
and
k
= (
k
,
k1
, . . . ,
1
).
Consider also the following cost function:
J
N
(u
N1
) = (4)
= E
_
x
N
W
N
x
N
+
N1
k=0
(x
k
W
k
x
k
+
k
u
k
U
k
u
k
)
I
N
_
.
Note that we are weighting the input only if it
is successfully received at the plant. In fact, if it
is not received, the plant applies zero input and
therefore there is no energy expenditure.
We now look for a control input sequence u
N1
as a function of the admissible information set I
k
,
i.e.
u
k
= g
k
(I
k
) (5)
that minimizes the functional dened in Equation
(4), i.e.
J
= min
u
N1
J
N
(u
N1
) = J
N
(u
N1
), (6)
where I
k
= {F
k
, G
k
} is one of the sets dened in
Equation 3. The set F corresponds to the informa-
tion provided under TCP communication protocol
in which successful or unsuccessful packet delivery
at the receiver is acknowledged to the sender. The
set G corresponds to the information provided
under UDP communication protocol in which the
sender receives no information about the delivery
of the transmitted packet to the receiver. This
protocol scheme is simpler to implement than
TCP from a communication standpoint, however
the price to pay is a less rich set of information.
The goal of this paper is to design optimal LQG
controllers for each of these protocols for a general
discrete-time linear stochastic system.
3. MATHEMATICAL BACKGROUND
Before proceeding, let us dene the following
variables:
x
k|k
= E[x
k
| I
k
],
e
k|k
= x
k
x
k|k
,
P
k|k
= E[e
k|k
e
k|k
| I
k
].
(7)
Derivations below will make use of the following
facts:
Lemma 1. The following facts are true:
(a) E[(x
k
x
k
) x
k
| I
k
] = E
_
e
k|k
x
k
| I
k
= 0
(b) E[x
k
Sx
k
| I
k
] = x
k
S x
k
+ trace
_
SP
k|k
_
=
x
k
S x
k
+E[e
k
Se
k
| I
k
] , S
(c) E[E[ g(x
k+1
) |I
k+1
] | I
k
] = E[g(x
k+1
) | I
k
] , g().
Use of the following properties will prove to be
useful when deriving the equation for the optimal
LQG controller. Let us compute the following
expectation:
E[x
k+1
Sx
k+1
| I
k
] =
= E[(Ax
k
+
k
Bu
k
+w
k
)
S(Ax
k
+
k
Bu
k
+w
k
) | I
k
]
= E[x
k
A
SAx
k
| I
k
] + u
k
B
SBu
k
+
+ 2 u
k
B
SA x
k|k
+ trace(SQ), (8)
where both the independence of
k
, w
k
, x
k
, and
the zero-mean property of w
k
are exploited. The
previous expectation holds true for both the in-
formation sets I
k
= {F
k
, G
k
}. Also
E[e
k|k
Te
k|k
| I
k
] =trace(TE[e
k|k
e
k|k
| I
k
]) =
=trace(TP
k|k
). (9)
4. TCP
First, equations for the optimal estimator are
derived. They will be needed to solve the LQG
controller design problem, as it will be shown
later.
4.1 Estimator Design, +
Equations for optimal estimator are derived using
similar arguments used for the standard Kalman
ltering equations. The innovation step is given
by:
x
k+1|k
= E[x
k+1
|
k
, F
k
] = A x
k|k
+
k
Bu
k
(10)
e
k+1|k
= x
k+1
x
k+1|k
= Ae
k|k
+w
k
(11)
P
k+1|k
= E[e
k+1|k
e
k+1|k
|
k
, F
k
] = AP
k|k
A
+Q,(12)
where the independence of w
k
and F
k
is ex-
ploited. Since y
k+1
,
k+1
, w
k
and F
k
are all in-
dependent of each other and following the same
approach described in (Sinopoli et al., 2004), the
correction step is given by:
x
k+1|k+1
= x
k+1|k
+
k+1
K
k+1
(y
k+1
C x
k+1|k
) (13)
e
k+1|k+1
= x
k+1
x
k+1|k+1
(14)
= (I
k+1
K
k+1
C)e
k+1|k
k+1
K
k+1
v
k+1
P
k+1|k+1
= P
k+1|k
k+1
K
k+1
CP
k+1|k
(15)
K
k+1
= P
k+1|k
C
(CP
k+1|k
C
+R)
1
, (16)
after taking the limit +. The initial
conditions for the estimator iterative equations
are x
0|1
= 0 and P
0|1
= P
0
.
4.2 Controller design
Derivation of the optimal feedback control law and
the corresponding value for the objective function
will follow the dynamic programming approach
based on the cost-to-go iterative procedure.
Dene the optimal value function V
k
(x
k
) as fol-
lows:
V
N
(x
N
)
= E[x
N
W
N
x
N
| F
N
]
V
k
(x
k
)
= min
u
k
E[x
k
W
k
x
k
+
k
u
k
U
k
u
k
+V
k+1
(x
k+1
) | F
k
]
Using dynamic programming theory (Kumar and
Varaiya, 1986) (Bertsekas and Tsitsiklis, 1996),
one can show that J
N
= V
0
(x
0
). We claim that
the value function V
k
(x
k
) can be written as:
V
k
(x
k
) = E[ x
k
S
k
x
k
| F
k
] +c
k
, k = 0, . . . , N (17)
where the matrix S
k
and the scalar c
k
are
to be determined and are independent of the
information set F. The proof follows an induction
argument. The claim is certainly true for k = N
with the choice of parameters S
N
= W
N
and c
N
=
0. Suppose now that the claim is true for k + 1,
i.e. V
k+1
(x
k+1
) = E[ x
k+1
S
k+1
x
k+1
| F
k+1
]+c
k+1
.
The value function at time step k is the following:
V
k
(x
k
) =
= min
u
k
E[x
k
W
k
x
k
+
k
u
k
U
k
u
k
+V
k+1
(x
k+1
) | I
k
]
= min
u
k
E[x
k
W
k
x
k
+
k
u
k
U
k
u
k
+
+E[x
k+1
S
k+1
x
k+1
+c
k+1
| F
k+1
] |I
k
]
= min
u
k
E[x
k
W
k
x
k
+
k
u
k
U
k
u
k
+x
k+1
S
k+1
x
k+1
+
+c
k+1
|I
k
]
= E[x
k
W
k
x
k
+x
k
A
S
k+1
Ax
k
| I
k
] +
+trace(S
k+1
Q) +E[c
k+1
| I
k
] + (18)
+ min
u
k
_
u
k
(U
k
+B
S
k+1
B)u
k
+ 2u
k
B
S
k+1
A x
k|k
_
where we used Lemma 1(c) in the third line,
and Equation (8) in the last two lines. The value
function is a quadratic function of the input,
therefore the minimizer can be simply obtained
by solving
V
k
u
k
= 0, which gives:
u
k
= (B
S
k+1
B+U
k
)
1
B
S
k+1
A x
k|k
= L
k
x
k|k
.
(19)
The optimal feedback is thus a simple linear
feedback of the estimated state. If we substitute
the minimizer back into Equation (18), and we use
the Equation (17) we get:
V
k
(x
k
) =
= E[x
k
W
k
x
k
+x
k
A
S
k+1
Ax
k
| I
k
] + trace(S
k+1
Q) +
+E[c
k+1
| I
k
] x
k|k
A
S
k+1
B(U
k
+B
S
k+1
B)
1
B
S
k+1
A x
k|k
(20)
E[x
k
S
k
x
k
| I
k
] +c
k
=
= E[x
k
W
k
x
k
+x
k
A
S
k+1
Ax
k
+ x
k
A
S
k+1
B(U
k
+B
S
k+1
B)
1
B
S
k+1
Ax
k
| I
k
] +
+trace(S
k+1
Q) +E[c
k+1
| I
k
] +
+ trace(A
S
k+1
B(U
k
+B
S
k+1
B)
1
B
S
k+1
P
k|k
)(21)
where we used Lemma 1(b). For the previous
equation to hold for all x
k
, we need to have:
S
k
= A
S
k+1
A+W
k
+ A
S
k+1
B(B
S
k+1
B +U
k
)
1
B
S
k+1
A (22)
c
k
= trace(A
S
k+1
B(U
k
+B
S
k+1
B)
1
B
S
k+1
P
k|k
) +
+ trace(S
k+1
Q) +E[c
k+1
| I
k
]
= trace
_
(A
S
k+1
A+W
k
S
k
)P
k|k
_
+
+ trace(S
k+1
Q) +E[c
k+1
| I
k
] (23)
Therefore, the cost function for the optimal LQG
using TCP is given by:
J
N
=V
0
(x
0
) = E[x
0
S
0
x
0
] +
+
N1
k=0
(trace
_
(A
S
k+1
A+W
k
S
k
)E
[P
k|k
]
_
+
+trace(S
k+1
Q))
= x
0
S
0
x
0
+ trace(S
0
P
0
) +
+
N1
k=0
(trace
_
(A
S
k+1
A+W
k
S
k
)E
[P
k|k
]
_
+
+trace(S
k+1
Q)). (24)
The matrices {P
k|k
}
N
k=0
are stochastic since they
are function of the sequence {
k
}. The exact ex-
pected value of these matrices cannot be com-
puted analytically, since they are nonlinear func-
tion of the arrival sequence
k
, as shown in
(Sinopoli et al., 2004). However, they can bounded
by computable deterministic quantities. In fact let
us consider the following equations:
P
k+1|k
= A
P
k|k1
A
+Q
+ A
P
k|k1
C
(C
P
k|k1
C
+R)
1
C
P
k|k1
A
(25)
P
k|k
=
P
k|k1
P
k|k1
C
(C
P
k|k1
C
+R)
1
C
P
k|k1
(26)
P
k+1|k
= (1 )A
P
k|k1
A
+Q (27)
P
k|k
= (1 )
P
k|k1
(28)
initialized to
P
0|1
=
P
0|1
= P
0
. Using similar
arguments as those in (Sinopoli et al., 2004),
it is possible to show that the matrices P
k|k
s
are concave and monotonic functions of P
k|k1
.
Therefore, the following bounds are true:
P
k|k
E
[P
k|k
]
P
k|k
, (29)
and we have:
J
min
N
J
N
J
max
N
(30)
J
max
N
= x
0
S
0
x
0
+ trace(S
0
P
0
) +
+
N1
k=0
(trace
_
(A
S
k+1
A+W
k
S
k
)
P
k|k
_
+
+ trace(S
k+1
Q)) (31)
J
min
N
= x
0
S
0
x
0
+ trace(S
0
P
0
) +
+
N1
k=0
(trace
_
(A
S
k+1
A+W
k
S
k
)
P
k|k
_
+ trace(S
k+1
Q)) (32)
4.3 Finite and Innite Horizon LQG control
The previous equations were derived for the nite
horizon LQG. The innite horizon LQG can be
obtained by taking the limit for N + of the
previous equations. However, the matrices {P
k|k
}
depend on the specic realization of the obser-
vation sequence {
k
}, therefore the minimal cost
J
N
is a stochastic function and does not have a
limit. Dierently from standard LQG controller
design where the controller always stabilizes the
original system, in the case of control packet loss,
the stability can be lost if the arrival probability
, is below a certain threshold. In particular
the equation for the cost matrix S
k
is the so-
lution of a Modied Riccati Algebraic Equation
(MARE) which was already introduced and stud-
ied in (Sinopoli et al., 2004). In particular, Equa-
tion (22) is the dual of the estimator equation
presented in (Sinopoli et al., 2004). Therefore, the
same conclusions can be drawn and the previous
result can be summarized in the following theo-
rem:
Theorem 2. (Finite Horizon LQG under TCP). Con-
Consider the system (1)-(2) and consider the
problem of minimizing the cost function (4) with
policy u
k
= f(F
k
), where F
k
is the informa-
tion available under TCP communication, given
in Equation (3). Then, the optimal control is
a linear function of the estimated system state
given by Equation (19), where the matrix S
k
can be computed iteratively using Equation (22).
The separation principle still holds under TCP
communication, since the optimal estimator is in-
dependent of the control input u
k
. The optimal
state estimator is given by Equations (10)-(13)
and (12)-(16), and the minimal achievable cost is
given by Equation (24).
Theorem 3. (Innite Horizon LQG under TCP).
Consider the same systems as dened in the pre-
vious theorem with the following additional hy-
pothesis: W
N
= W
k
= W and U
k
= U. Moreover,
let (A, B) and (A, Q
1
2
) be stabilizable, and let
(A, C) and (A, W
1
2
) be detectable. Let us consider
the limiting case N +. There exist critical
arrival probabilities
min
and
min
which satisfy
the following property:
min
_
1, 1
1
|
max
(A)|
2
_
min
1, (33)
min
_
1, 1
1
|
max
(A)|
2
_
min
1, (34)
where |
max
(A)| is the eigenvalue of matrix A
with largest absolute value, such that for all >
min
and >
min
we have:
L
k
= L
= (B
B +U)
1
B
A (35)
1
N
J
min
N
1
N
J
N
1
N
J
max
N
(36)
where the mean cost bounds J
min
N
, J
max
N
are given
by:
J
max
= lim
N+
1
N
J
max
N
= trace((A
A+W
k
S
)(
(C
+R)
1
C
)) + trace(S
Q)
J
min
= lim
N+
1
N
J
min
N
= (1 )trace
_
(A
A+W
k
S
_
+
+ trace(S
Q)
and the matrices S
, P
, P
are:
S
= A
A+W A
B(B
B +U)
1
B
A
P
= AP
+Q AP
(CP
+R)
1
CP
= (1 )AP
+Q
Moreover, the assumptions above are necessary
and sucient conditions for boundedness of the
cost function under LQG feedback. The critical
probabilities
min
and
min
can be computed via
the solution of the following LMIs optimization
problems:
min
= argmin
(Y, Z) > 0, 0 Y I.
(Y, Z) =
=
_
_
Y
(Y A+ZC)
_
1 Y A
(A
Y +C
) Y 0
_
1 A
Y 0 Y
_
_
min
= argmin
(Y, Z) > 0, 0 Y I.
(Y, Z) =
=
_
Y
(Y A
+ZB
1 Y A
(AY +BZ
) Y 0
1 AY 0 Y
_
5. UDP
In this section equations for the optimal estima-
tor and controller design for case of UDP com-
munication protocol are derived. The UDP case
corresponds to the information set G
k
, as dened
in Equation (3). Some of the derivations are anal-
ogous to the previous section and are therefore
skipped.
5.1 Estimator Design, +
We derive the equations for optimal estimator
using similar arguments to the standard Kalman
ltering equations. The innovation step is given
by:
x
k+1|k
=E[x
k+1
|G
k
] = E[Ax
k
+
k
Bu
k
+w
k
|G
k
]
=AE[x
k
|G
k
] +E[
k
]Bu
k
=A x
k|k
+ Bu
k
(37)
e
k+1|k
=x
k+1
x
k+1|k
(38)
P
k+1|k
=E[e
k+1|k
e
k+1|k
|G
k
]
=AP
k|k
A
+Q+ (1 )Bu
k
u
k
B
(39)
where we used the independence and zero-mean of
w
k
, (
k
), and G
k
. Note how under UDP com-
munication protocol, dierently from TCP com-
munication, the error covariance P
k+1|k
depends
explicitly on the control input u
k
. This is the main
dierence with TCP.
The correction step is the same as for the TCP
case, given by:
x
k+1|k+1
= x
k+1|k
+
k+1
K
k+1
(y
k+1
C x
k+1|k
)
P
k+1|k+1
=P
k+1|k
k+1
K
k+1
CP
k+1|k
, (40)
K
k+1
=P
k+1|k
C
(CP
k+1|k
C
+R)
1
, (41)
where again we took the limit +.
The initial conditions for the estimator iterative
equations are:
x
0|1
=0 (42)
P
0|1
=P
0
(43)
5.2 Controller design: General case
In this section, we show that the optimal LQG
controller is, in general, not a linear function of the
state estimate, since estimator and controller de-
sign cannot be separated anymore. To show this,
we consider a simple scalar system and we pro-
ceed using the dynamic programming approach.
Let us consider an unstable scalar system where
A = 1, B = 1, C = 1, W
N
= W
k
= 1, U
k
=
0, R = 1, Q = 0. Similarly to the TCP case,
the value function, V
k
(x
k
) for k = N is given by
V
N
(x
N
) = E[x
N
W
N
x
N
| G
N
] = E[x
2
N
| G
N
]. Let
us consider the value function for k = N 1:
V
N1
(x
N1
) =
= min
u
N1
E[x
2
N1
+V
N
(x
N
) | G
N1
]
= min
u
N1
E[x
2
N1
+x
2
N
| G
N1
]
= min
u
N1
_
E[x
2
N1
+x
2
N1
| G
N1
]+
+ u
2
N1
+ 2 u
N1
x
N1|N1
_
(44)
= 2 E[x
2
N1
| G
N1
] x
2
N1|N1
(45)
= (2 ) E[x
2
N1
| G
N1
] + P
N1|N1
(46)
where we used Equation (8) in line (44), we
substituted the minimizer u
N1
, given by:
u
N1
= x
N1|N1
, in line (45), and we used Lemma 1(b) in line (46).
Let us continue considering the value function for
k = N 2:
V
N2
(x
N2
) =
= min
u
N2
E[x
2
N2
+V
N1
(x
N1
) | G
N2
]
= E[(3 )x
2
N2
| G
N2
] + + P
N2|N2
+
+ (1 )P
N2|N2
+
+ min
u
N2
_
(2 )u
2
N2
+ 2 (2 )u
N2
x
N2|N2
+
2
(1 )(1 )u
2
N2
+
+
1
P
N2|N2
+ (1 )u
2
N2
+ 1
_
(47)
The rst three terms inside the round parenthesis
are convex quadratic functions of the control input
u
N2
, however the last term is not. Therefore, the
minimizer u
N2
is, in general, a non-linear func-
tion of the information set G
k
. We can summarize
this result in the following theorem:
Theorem 4. Let us consider the stochastic system
dened in Equations (1) with horizon N 2.
Then, the optimal control feedback u
k
= g
k
(G
k
)
that minimizes the functional (4) under UDP is,
in general, a nonlinear function of information
set G
k
.
PROOF. The proof can be given by showing the
previous counter-example, where even a simple
scalar system does not have a linear solution.
The nonlinearity of the input feedback arises
from the fact that the correction error covariance
matrix P
k+1|k+1
is a non-linear function of the
innovation error covariance P
k+1|k
. The only case
when P
k+1|k+1
is linear occurs when R = 0 and
C is invertible, i.e. when it is possible to measure
the state x
k
without noise. This special case is
considered in the next section.
5.3 Controller design: R=0 and C invertible
Without loss of generality we can assume C = I,
otherwise the linear transformation y = Cx would
give an equivalent state space representation of
the system for which C = I. Let us now consider
the case when there is no measurement noise and
we can measure the state x
k
when an observation
packet is delivered.
In this case the estimator equations simplify as
follows:
K
k+1
= I
P
k+1|k+1
= (1
k+1
)P
k+1|k
=
= (1
k+1
)(A
P
k|k
A+Q+ (1 )Bu
k
u
k
B
)
E[P
k+1|k+1
| G
k
] =
= (1 )(A
P
k|k
A+Q+ (1 )Bu
k
u
k
B
)
where we used independence of
k+1
and P
k|k
with
respect to G
k
in the last line. Similarly to the TCP
case, we claim that the value function V
k
(x
k
) can
be written as follows:
V
k
(x
k
) = E[x
k
S
k
x
k
| G] + trace(T
k
P
k|k
) + trace(D
k
Q)
(48)
for k = 0, . . . , N. This is clearly true for k = N,
since V
N
(x
N
) = E[x
N
W
N
x
N
| G
N
]. Therefore:
S
N
= W
N
, T
N
= 0, D
N
= 0. (49)
Let us suppose that the claim (48) is true for k+1
and let us show by induction it holds true for k.
V
k
(x
k
) = min
u
k
E[x
k
W
k
x
k
+
k
u
k
U
k
u
k
+V
k+1
(x
k+1
) | G
k
]
= min
u
k
_
E[x
k
W
k
x
k
| G
k
] + u
k
U
k
u
k
+E[x
k+1
S
k+1
x
k+1
+
+trace(T
k+1
P
k+1|k+1
) + trace(D
k+1
Q) | G
k
]
_
= E[x
k
(W
k
+A
S
k+1
A)x
k
| G
k
] +
+(1 )trace(T
k+1
(A
P
k|k
A+Q)) + trace(D
k+1
Q)
+min
u
k
_
u
k
U
k
u
k
+ u
k
B
S
k+1
Bu
k
+ 2 u
k
B
S
k+1
A x
k|k
+
+ (1 )(1 )trace(T
k+1
Bu
k
u
k
B
)
_
= E[x
k
(W
k
+A
S
k+1
A)x
k
| G
k
] +
+(1 )trace(AT
k+1
A
P
k|k
) + trace((D
k+1
+ (1 )T
k+1
)Q)
+ min
u
k
_
u
k
(U
k
+B
(S
k+1
+ (1 )(1 )T
k+1
)B)u
k
+
+2u
k
B
S
k+1
A x
k|k
_
. (50)
The minimizer can be easily computed, since
the quantity inside the parenthesis is a convex
quadratic function:
u
k
=(U
k
+B
(S
k+1
+ (1 )(1 )T
k+1
) B)
1
B
S
k+1
A x
k|k
= L
k
x
k|k
, (51)
which is linear function of the estimated state
x
k|k
. Substituting back into the value function we
get:
V
k
(x
k
) = E[x
k
(W
k
+A
S
k+1
A)x
k
| G
k
] +
+(1 )trace(AT
k+1
A
P
k|k
) + trace((D
k+1
+ (1 )T
k+1
)Q)
x
k|k
A
S
k+1
BL
k
x
k|k
= E[x
k
(W
k
+A
S
k+1
A A
S
k+1
BL
k
| G
k
] +
+trace(((1 )AT
k+1
A
+ A
S
k+1
BL
k
P
k|k
) +
+trace((D
k+1
+ (1 )T
k+1
)Q)
Therefore, V
k
(x
k
) = x
k|k
S
k
x
k|k
+ trace(T
k
P
k|k
) + trace(D
k
Q).
This implies that:
S
k
= A
S
k+1
A+W
k
A
S
k+1
B
(U
k
+B
(S
k+1
+ (1 )(1 )T
k+1
)B)B
S
k+1
A (52)
T
k
= (1 )A
T
k+1
A+A
S
k+1
A+W
k
S
k
(53)
D
k
= D
k+1
+ (1 )T
k+1
The optimal minimal cost for the nite horizon,
J
N
= V
0
(x
0
) is then given by:
J
N
= x
0
S
0
x
0
+trace(S
0
P
0
)+(1 )
N
k=1
trace(T
k
Q) (54)
For the innite horizon optimal controller, nec-
essary and sucient condition for the average
minimal cost J
= lim
N+
1
N
J
N
to be nite
is that the coupled iterative Equations (52) and
(53) should converge to a nite value S
and T