Sie sind auf Seite 1von 12

The Hamilton-Jacobi-Bellman Equation

Contents

1 The
1.1
1.2
1.3
1.4

Hamilton-Jacobi-Bellman Equation
The optimal control problem . . . . . . . .
Deriving the HJB equation . . . . . . . . .
Solution of the HJB equation by examples
Exercises . . . . . . . . . . . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

2
2
2
4
10

The Hamilton-Jacobi-Bellman Equation

1.1

The optimal control problem

In the previous description of the dynamic programming method, we approximated continuous systems to discrete systems. This approach leads to
a recurrence relation that is suited for digital implementation. In this chapter
we shall consider another approach, for continuous systems, that leads to a
nonlinear partial differential equation - the Hamilton-Jacobi-Bellman (HJB)
equation. We shall show how the optimal performance measure, if it satisfies
the HamiltonJacobi equation, determines an optimal control.
Consider the following continuous optimal control problem.
The process described by the state equation
x = f(x(t), u(t), t)

(1.1)

is to be controlled to minimize the performance measure


J = h(x(tf ), tf ) +

tf

t0

g(x( ), u( ), )d

(1.2)

where g and h are scalar functions, t0 and tf are fixed, is a variable of


integration.

1.2

Deriving the HJB equation

The HJB equation will be obtained using the principle of optimality and the
results obtained for the dynamic programming.

The continuous optimal control problem will be included in a larger class


of problems by considering the performance measure:
J(x(t), t, u( )t tf ) = h(x(tf ), tf ) +

tf

g(x( ), u( ), )d

(1.3)

where t can be any value less than or equal to tf and x(t) can be any admissible state value. Notice that the performance measure will depend on
the numerical values for x(t) and t and on the optimal control history in the
interval [t, tf ], (Kirk, 2004).
Let us now attempt to determine the controls that minimize (1.3) for all
admissible x(t) and all t tf . The minimum cost function is then:


J (x(t), t) = min h(x(tf ), tf ) +


u( )

tf

g(x( ), u( ), )d

(1.4)

We shall split the time interval [t, tf ] into [t, t + t] and [t + t, tf ],


and are specifically interested in the case where t 0. By subdividing the
interval we obtain:

J (x(t), t) = min h(x(tf ), tf ) +


u( )

t+t

g(x, u, )d +

tf

t+t

g(x, u, )d

(1.5)

The principle of optimality requires that:

J (x(t), t) = min
u

(Z

t+t

g(x, u, )d + J (x(t + t), t + t)

(1.6)

where J (x(t + t), t + t) is the minimum cost of the process for the time
interval t + t tf with initial state x(t + t).
Assuming J has bounded second derivatives in both arguments, we can
expand J (x(t + t), t + t) in a Taylor series about the point (x(t), t)
(truncated after the first order terms) to obtain:

J (x(t), t) = min
u

"

(Z

t+t

g(x, u, )d + J (x(t), t) +

"

J
J
(x(t), t) t +
(x(t), t)
t
x

#T

For small t,

[x(t + t) x(t)](1.7)

t
x(t + t) x(t) = x(t)

t+t

g(x, u, )d = g(x, u, t)t

Because of J (x(t), t) is independent of u we cancel the terms from the right


and left hand side of (1.7). x will be replaced using the state equation (1.1):

"

J (x(t), t)
J (x(t), t)
0 = min g(x, u, t)t +
t +
u( )
t
x

#T

f(x, u, t)t

(1.8)

Dividing the above equation by t yields:

0=

"

J (x(t), t)
J (x(t), t)
+ min g(x, u, t) +
u(
)
t
x

Define the Hamiltonian as:

"

J (x(t), t)
H = g(x, u, ) +
x

#T

#T

f(x, u, t)

f(x, u, t)

(1.9)

(1.10)

Then we write the partial differential equation (1.9) as:


0=

J (x(t), t)
+ min H
u
t

(1.11)

To find the boundary value for this partial differential equation set t = tf ;
from (1.4) we have:
J (x(tf ), tf ) = h(x(tf ), tf )

(1.12)

We have obtained the Hamilton-Jacobi-Bellman equation - HJB


(1.11) subject to the boundary conditions (1.12). It provies the solution
to the optimal control problems for general nonlinear dynamical systems.
However, analytical solution to the HJB equation is difficult to obtain in
most cases. A solution could be obtained analitically by guessing a form of
the minimum cost function. In general, the HJB equation must be solved by
numerical techniques. Actually, a numerical solution involves some sort of a
discrete approximation to the exact optimization relationship (1.11), (Kirk,
2004).

1.3

Solution of the HJB equation by examples

Example 1.1 A first order system is described by the differential equation


x(t)

= x(t) + u(t)
4

It is desired to find the control law that minimizes the cost function
1
J = x2 (T ) +
4

1 2
u (t)dt
4

The final time T is specified and the admissible state and control values are
not constrained by any boundaries.
1
g(x(t), u(t), t) = u2 (t)
4
f(x(t), u(t), t) = x(t) + u(t)
The hamiltonian:
H(x(t), u(t),

1
J (x(t), t)
J
) = u2 (t) +
(x(t) + u(t))
x
4
x

and since the control is unconstrained, a necessary condition that optimal


control must satisfy is:
H
1
J
= u(t) +
=0
u
2
x
The control indeed minimizes the hamiltonian function because
2H
1
= >0
2
u
2
The optimal control:

J
u (t) = 2
x
when substituted into the HJB equation:

0=

J
+ min H
u
t

gives:
J 1
J
0=
+
2
t
4
x

!2

J
J
+
x(t) 2
x
x

J
J
0=

t
x

!2

J
x(t)
x

(1.13)

The boundary value for J is:


1
J (x(T ), T ) = x2 (T )
4
5

(1.14)

One way to solve the HJB equation is to guess a form for the solution
and see if it can be made to satisfy the differential equation and the boundary
conditions. Since J (x(T ), T ) is quadratic in x(T ) guess
1
J (x(t), t) = p(t)x2 (t)
2

(1.15)

where p(t) represents an unknown scalar function of t that is to be determined.


Notice that
J
= p(t)x(t)
(1.16)
x
which together with the expression determined for u (t) implies that:
u (t) = 2p(t)x(t)
Thus, if p(t) can be found such that (1.13) and (1.14) are satisfied, the optimal control is linear feedback of the state - indeed this was the motivation
for selecting the form (1.15).
Substituting (1.15) and
J
1
2
= p(t)x

(t)
t
2

(1.17)

into (1.13) gives:


1
2
0 = p(t)x

(t) p2 (t)x2 (t) + p(t)x2 (t)


2
Since this equation must be satisfied for all x(t), p(t) may be calculated from
the differential equation:
1
p(t)
p2 (t) + p(t) = 0
2

(1.18)

with the final condition: p(T ) = 1/2. p(t) is a scalar function of t; therefore
the solution can be obtained using the transformation z(t) = 1/p(t), with the
result:
1
p(t) =
(1.19)
1 + e2(tT )
Obs. The solution of (1.18) is obtained as follows:
1
=2
p(T )
z
2
2
(1.18) 2 2 + = 0
z
z
z
z 2z + 2 = 0

z(t) =

1
;
p(t)

z(T ) =

The solution of homogeneous part of the above equation:


z 2z = 0, is z(t) = C1 e2t
and the general solution:
z(t) = C1 e2t + C2
The constants are calculated by replacing the general solution into the equation and using the final condition:
2C1 e2t 2(C1 e2t + C2 ) + 2 = 0 2C2 + 2 = 0 C2 = 1
z(t) = C1 e2t + 1; z(T ) = C1 e2T + 1 = 2; C1 = e2t
Then, the general solution yields:
z(t) = e2(tT ) + 1 p(t) =

1
e2(tT )

+1

The optimal control law is then:


u (t) = 2

J (x, t)
= 2p(t)x(t)
x

2
x(t)
1 + e2(tT )
The controller is a time-varying one and the block diagram is shown in Figure
1.1.
u (t) =

u*(t)

Process

x(t)

dx(t)/dt = x(t) + u(t)

-2
(1+e2(t-T))
Figure 1.1: Optimal control for example 1.1
Notice that as T , the linear time-varying feedback approaches constant feedback p(t) = 1 and the controlled system
x(t)

= x(t) 2x(t) = x(t)


is stable.
7

Example 1.2 , (Weber, 2000). A man is considering his lifetime plan of


investment and expenditure. He has an initial level of savings x(0) and no
other income other than that which he obtains from investment at a fixed
interest rate . His total capital is therefore governed by the equation:
x(t)

= x(t) u(t),
where > 0 and u(t) is his rate of expenditure. He wishes to maximize
J=

eat u(t)dt

for a given T . Find his optimal policy u (t).


The problem can be trasformed into a minimum one by changing the sign
of the performance measure:

J (x(t), t) = min
u

at

u(t)dt

Let a = 1; = 4.
We solve this problem using the Hamilton-Jacobi-Bellman equation:
J
0=
+ min H Jt + H = 0
u
t
Then:
f (x, u, t) = 4x(t) u(t);
and the Hamiltonian is:
H = g(x, u, t) +

g(x, u, t) = et u(t)

J
f (x, u, t) = et u + Jx (4x u)
x

The boundary condition is: J (x(T ), T ) = 0 since h(x(T ), T ) = 0.


We calculate u that minimizes the hamiltonian from the necessary condition:
H
1 et
= 0 = Jx
u
2
u
u =

1 1
1 2t 2
Jx
2 = e
2t

4 e Jx
4

Substituting u into the hamiltonian we obtain:


H =

1 (3e2t + 16xJx 2 )
4
Jx
8

Suppose we try a solution of the form:


q

J (x(t), t) = p(t) x(t)


Then:

q
J
= p(t)

x(t)
t
1 p(t)
J
Jx =
= q
x
2 x(t)

Jt =

By substituting the above, the HJB equation becomes:


1 3e2t + 4p2
p +
x=0
2
p
!

It has to be satisfied for any x(t), thus we can obtain p(t) from:
p +

1 3e2t + 4p2
=0
2
p

subject to the final condition p(T ) = 0.


The solution of the above equation is:
p(t) =

1 2t
6e + 4e4t C
2

The constant C is calculated from the final condition and has the value:
C=

3
2e2T

The unknown function p(t) is:


p(t) =

1 q 2t
6e 6e4t+2T
2

Then the optimal control law is given by:


1
1 p(t)
u (t) = e2t Jx 2 , where Jx = q
4
2 x(t)
thus:
u (t) =

e2t

4x(t)
4x(t)
=
4t+2T
6e
)
1 6e2(T t)

(e2t

1.4

Exercises

Exercise 1.1 Consider a general class of dynamical systems modeled by the


differential equation:
x(t)

= ax(t) + bu(t)
with the associated cost index:
1
1
J = f x2 (tf ) +
2
2

tf

t0

qx2 (t) + ru2 (t)dt

where the initial time t0 = 0, the final time tf < is fixed, and the final
state x(tf ) is free. Find a control u (t) that minimizes J.
Exercise 1.2 Consider a first-order system described by the state equation
x(t)

= x(t) + u(t)
Determine the optimal control u (t) that minimizes the performance measure:
Z 1
1
J = x2 (1) +
u2 (t)dt
2
0

Hint. Consider as a candidate solution for the HJB equation J (x(t), t) =


x2 (t)p(t).
Exercise 1.3 Consider a first-order system described by the state equation
x(t)

= x(t) u(t)
Determine the optimal control u (t) that minimizes the performance measure:
2

J = x (1) +

(x(t) u(t))2 dt

Hint. Consider as a candidate solution for the HJB equation J (x(t), t) =


x2 (t)p(t).

10

Bibliography

Kirk, D. E. (2004). Optimal Control Theory. An Introduction. Dover Publications, Inc.


Weber, R. (2000).
Optimization
www.statslab.cam.ac.uk.

11

and

control.

online

at

Das könnte Ihnen auch gefallen