Mae 546 Lecture 23

Stochastic Optimal Control
Robert Stengel Optimal Control and Estimation MAE 546 Princeton University, 2012 ! ! ! ! ! Nonlinear systems with random inputs and perfect measurements Nonlinear systems with random inputs and imperfect measurements Certainty equivalence and separation Stochastic neighboring-optimal control Linear-quadratic-Gaussian (LQG) control
Nonlinear Systems with Random Inputs and Perfect Measurements

Inputs and initial conditions are uncertain, but the state can be measured without error
z (t ) = x (t )
! x ( t ) = f ! x ( t ) , u ( t ) , w ( t ) ,t # " $
E ! x ( 0 ) # = x ( 0 ); E ! x ( 0 ) % x ( 0 ) # ! x ( 0 ) % x ( 0 ) # " $ " $" $ E ! w ( t ) # = 0; E ! w ( t ) w T (& ) # = W ( t )' ( t % & ) " $ " $
}=0
Copyright 2012 by Robert Stengel. All rights reserved. For educational use only. http://www.princeton.edu/~stengel/MAE546.html http://www.princeton.edu/~stengel/OptConEst.html
Assume that random disturbance effects are small and additive
! x ( t ) = f ! x ( t ) , u ( t ) ,t # + L ( t ) w ( t ) " $
Cost Must Be an Expected Value

Deterministic cost function cannot be minimized because
disturbance effect on state cannot be predicted state and control have become random variables
Stochastic Euler-Lagrange Equations?

There is no single optimal trajectory Expected values of Euler-Lagrange necessary conditions may not be well dened
( &'[x(t f )] + 1) E " !(t f ) $ = E ) , # % * &x T
min J = ! " x(t f ) $ + & L [ x(t), u(t)] dt # % u(t )

to
tf
However, the expected value of a deterministic cost function can be minimized

tf ' + ) ) min J = E (! " x(t f ) $ + & L [ x(t), u(t)] dt , # % u(t ) to ) ) * -
( ' H[x(t), u(t), !(t),t] + ! 2) E " !(t) $ = &E ) , # % 'x * # ! H[x(t), u(t), "(t),t] & 3) E $ '=0 !u % (
Stochastic Value Function for a Nonlinear System

However, a Hamilton-Jacobi-Bellman (HJB) based on expectations can be solved Base the optimization on the Principle of Optimality Optimal expected value function at t1
t1 ) + + V * ( t1 ) = E *! " x * (t f ) $ & ( L [ x * (' ), u * (' )] d' . # % tf + + , / t1 ) + + = min E *! " x * (t f ) $ & ( L [ x * (' ), u(' )] d' . # % u tf + + , /
Rate of Change of the Value Function

Total time-derivative of V*
dV * = !E { L [ x * (t1 ), u * (t1 )]} dt t =t1

x(t) and u(t) can be known precisely; therefore
dV * = !L [ x * (t1 ), u * (t1 )] dt t =t1
Incremental Change in the Value Function

Apply chain rule to total derivative
Introduction of the Trace

Trace of a matrix product
dV * " !V * !V * % ! = E$ + x dt !x ' # !t &

Incremental change in value function, !V
Expand to second degree
) "V * , "V * dV * 1 # " 2V * & 2 ! ! ! !t = E + !t + !V* = x!t + % xT x !t + ". "x 2 ( "t "x dt 2$ ' * ) "V * , & " 2V * "V * 1# = E+ !t + ( f (.) + Lw (.)) !t + 2 % ( f (.) + Lw (.))T "x 2 ( f (.) + Lw (.))( !t 2 + ". "x $ ' * "t -
Tr ( ABC ) = Tr ( CAB ) = Tr ( BCA ) Tr xT Qx = Tr xxT Q = Tr QxxT

Cancel !t
dim ! Tr ( ) # = 1 % 1 " $
* "V * "V * & dV * " 2V * 1 # ! E, + ( f (.) + Lw (.)) + 2 Tr % ( f (.) + Lw (.))T "x 2 ( f (.) + Lw (.))( )t / dt "x $ ' . + "t * "V * "V * & 1 # " 2V * + = E, ( f (.) + Lw (.)) + 2 Tr % "x 2 ( f (.) + Lw (.))( f (.) + Lw (.))T ( )t / "x $ ' . + "t
Toward the Stochastic HJB Equation

! Because x(t) and u(t) can be measured
) !V * !V * % , dV * 1 " ! 2V * = E+ + ( f (.) + Lw (.)) + 2 Tr # !x 2 ( f (.) + Lw (.))( f (.) + Lw (.))T & (t . $ ' dt !t !x * = ) !V * % , 1 " ! 2V * !V * !V * + f (.) + E + Lw (.) + Tr $ ( f (.) + Lw (.))( f (.) + Lw (.))T ' (t . !x 2 # !x 2 !t & * !x
Toward the Stochastic HJB Equation

! Disturbance is assumed to be zero-mean white noise
E ! w ( t ) w T (% ) # = W ( t )& ( t ' % ) " $ E ! w (t )# = 0 " $
Uncertain disturbance input can only increase the value function rate of change
+ ( ! 2V * dV * !V * !V * 1 T T = + f (.) + lim Tr ) 2 $ E f (.) f (.) "t + LE w (.) w (.) LT & "t , ' "t #0 dt !t !x 2 * !x % = & !V * !V * 1 $! V * (t ) + (t ) f (.) + Tr . 2 (t ) L (t ) W (t ) L (t )T / !t !x 2 % !x '
2
Stochastic Principle of Optimality

(Perfect Measurements)
% dV * !V * !V * 1 " ! 2V * = (t ) + (t ) f (.) + Tr $ 2 (t ) L (t ) W (t ) L (t )T ' dt !t !x 2 # !x &
! Substitute for total derivative, dV*/dt = L(x*,u*) ! Solve for the partial derivative, ! V*/! t ! Stochastic HJB Equation
!V * (t ) = !t ) !V * 1 # ! 2V * T %" min E * L # x * ( t ) , u ( t ) ,t % + $ & !x ( t ) f # x * ( t ) , u ( t ) ,t % + 2 Tr ' !x 2 ( t ) L ( t ) W ( t ) L ( t ) ( . $ & u , $ &, / + Boundary (terminal) condition : V * t f = E #0 t f % $ &
Observations of Stochastic Principle of Optimality

!V * (t ) = !t ) !V * 1 # ! 2V * T %" min E * L # x * ( t ) , u ( t ) ,t % + $ & !x ( t ) f # x * ( t ) , u ( t ) ,t % + 2 Tr ' !x 2 ( t ) L ( t ) W ( t ) L ( t ) ( . $ & u , , $ &/ +
! Control has no effect on the disturbance input ! Criterion for optimality is the same as for the deterministic case ! Disturbance uncertainty increases the magnitude of the total optimal value function, V*(0)
( )
( )
The Information Set, I

! Sigma algebra(Wikipedia denitions)
! The collection of sets over which a measure is dened ! The collection of events that can be assigned probabilities ! A measurable space
Information Sets and Expected Cost
! Information available at current time, t1

! All measurements from initial time, to ! All control commands from initial time
I [ t o ,t1 ] = {z [ t o ,t1 ], u [ t o ,t1 ]}

! Plus available model structure, parameters, and statistics
I [ t o ,t1 ] = {z [ t o ,t1 ], u [ t o ,t1 ], f ( ) ,Q, R,!}
A Derived Information Set, ID

! Measurements may be directly useful, e.g.,
! Displays ! Simple feedback control
Additional Derived Information Sets

! Markov derived information set
! Most current mean and covariance from a state estimator
! ... or they may require processing, e.g.,

! Transformation ! Estimation
I MD ( t1 ) = {x ( t1 ) , P ( t1 ) , u ( t1 )}
! Multiple model derived information set
! Parallel estimates of current mean, covariance, and hypothesis probability mass function
! Example of a derived information set

! History of mean and covariance from a state estimator
I D [ t o ,t1 ] = {x [ t o ,t1 ], P [ t o ,t1 ], u [ t o ,t1 ]}
I MM ( t1 ) = ! x A ( t1 ) , PA ( t1 ) , u ( t1 ) , Pr ( H A ) # , ! x B ( t1 ) , PB ( t1 ) , u ( t1 ) , Pr ( H B ) # ,! " $ " $
Required and Available Information Sets for Optimal Control

! Optimal control requires propagation of information back from the nal time
! Hence, it requires the entire information set, extending from to to tf
Expected Values of State and Control

! Expected values of the state and control are conditioned on the information set
I !t o ,t f # " $
! Separate information set into knowable and predictable parts
E ! x (t ) | I D # = x (t ) " $
$" $ E ! x (t ) % x (t )# ! x (t ) % x (t )# | I D = P (t ) "
T
I !t o ,t f # = I [ t o ,t1 ] + I !t1 ,t f # " $ " $

! ! Knowable information has been received Predictable information is to come
... where the conditional expected values are obtained from a Kalman-Bucy lter
Dependence of the Stochastic Cost Function on the Information Set

J=
tf tf & * 1 ( ( E ' E ! Tr !S(t f )x(t f )xT (t f ) # | I D # + % E Tr !Qx ( t ) xT ( t ) # dt + % E Tr ! Ru ( t ) uT ( t ) # dt + " $ " $ $ " " $ 2 ) 0 0 ( ( ,
Certainty-Equivalent and Stochastic Incremental Costs

J=
tf tf * 1 & ( ( E ' Tr S(t f ) ! P t f + x(t f )xT (t f ) # + % Tr Q ! P ( t ) + x ( t ) xT ( t ) # dt + % Tr ! Ru ( t ) uT ( t ) # dt + " $ " $ " $ 2 ) 0 0 ( ( ,
( )
! Expand the state covariance

%# % P (t ) = E " x (t ) ! x (t )$ " x (t ) ! x (t )$ | I D #
T
! J CE + J S
= E " x ( t ) xT ( t ) ! x ( t ) xT ( t ) ! x ( t ) xT ( t ) + x ( t ) xT ( t ) $ | I D # %
E ! x ( t ) xT ( t ) # | I D = E ! x ( t ) xT ( t ) # | I D = x ( t ) xT ( t ) " $ " $
}
}
! Cost function has two parts
}
J CE = JS =
! Certainty-equivalent cost ! Stochastic increment cost

tf tf * 1 & ( ( E ' Tr !S(t f )x(t f )xT (t f ) # + % Tr Qx ( t ) xT ( t ) dt + % Tr ! Ru ( t ) uT ( t ) # dt + " $ " $ 2 ( 0 0 ( ) ,
} {
P ( t ) = E ! x ( t ) xT ( t ) # | I D % x ( t ) xT ( t ) " $ or E ! x ( t ) xT ( t ) # | I D = P ( t ) + x ( t ) xT ( t ) " $
tf & * 1 ( ( E ' Tr !S(t f )P t f # + % Tr !QP ( t ) # dt + " $ $ 2 ) " 0 ( ( ,
( )
Expected Cost of the Trajectory

! Optimized cost function
V * (to ) ! J * t f
( )
tF ( , * * = E )! " x * (t f ) $ + ' L [ x * (& ), u * (& )] d& # % * * t0 + .
Expected Cost of the Trajectory

! ! For planning or post-trajectory analysis, one can assume that the entire information set is available For real-time control, t1 ! tf, and future information can only be predicted If separation property applies (TBD), future conditioning effect can be predicted If not, future conditioning effect can only be approximated
! Law of total expectation

E (! ) = E (! | I [ t o ,t1 ]) Pr {I [ t o ,t1 ]} + E ! | I "t1 ,t f $ Pr I "t1 ,t f $ # % # % = E " E (! | I ) $ # %
) {
! !
! Because the past is established at t1
E ( J *) = E ( J* | I [ t o ,t1 ])[1] + E J* | I !t1 ,t f # Pr I !t1 ,t f # " $ " $ = E ( J* | I [ t o ,t1 ]) + E J* | I !t1 ,t f # Pr I !t1 ,t f # " $ " $
) {
) {
Separation Property and Certainty Equivalence

Separation Property
Optimal Control Law and Optimal Estimation Law can be derived separately Their derivations are strictly independent
Neighboring-Optimal Control with Uncertain Disturbance, Measurement, and Initial Condition
Certainty Equivalence Property

Separation property plus, ... The Stochastic Optimal Control Law and the Deterministic Optimal Control Law are the same The Optimal Estimation Law can be derived separately Linear-quadratic-Gaussian control is certainty-equivalent
Immune Response Example

! Optimal open-loop drug therapy (control)
! Assumptions
! Initial condition known without error ! No disturbance
Open-Loop Optimal Control for Lethal Initial Condition
Immune Response Example with Optimal Feedback Control

Open- and Closed-Loop Optimal Control for 150% Lethal Initial Condition
Optimal closed-loop therapy

! Assumptions
! Small error in initial condition ! Small disturbance ! Perfect measurement of state
Stochastic optimal closed-loop therapy

! Assumptions
! ! ! ! Small error in initial condition Small disturbance Imperfect measurement Certainty-equivalence applies to perturbation control
Immune Response with Stochastic Optimal Feedback Control

(Random Disturbance and Measurement Error Not Simulated)
Low-Bandwidth Estimator (|W| < |N|) High-Bandwidth Estimator (|W| > |N|)
Immune Response to Random Disturbance with Stochastic Neighboring-Optimal Control

Disturbance due to
Re-infection Sequestered pockets of pathogen
Noisy measurements Closed-loop therapy is robust ... but not robust enough:
Organ death occurs in one case
Probability of satisfactory therapy can be maximized by stochastic redesign of controller
Initial control too sluggish to prevent divergence
Quick initial control prevents divergence
Stochastic Principle of Optimality Applied to the Linear-Quadratic (LQ) Problem

! Quadratic value function
Stochastic Linear-Quadratic Optimal Control
to ) + + V ( t o ) = E *! " x(t f ) $ & ( L [ x(' ), u(' )] d' . # % tf + + , / to ) " Q(t) M(t) $ " x(t) $ 1 + + 10 = E *xT (t f )S(t f )x(t f ) & ( " xT (t) uT (t) $ 0 1 dt # % 0 MT (t) R(t) 1 0 u(t) 1 . 2 + tf + % / %# # ,
! Linear dynamic constraint
! x ( t ) = F(t)x(t) + G(t)u(t) + L(t)w(t)
Components of the LQ Value Function

! Quadratic value function has two parts
Value Function Gradient and Hessian

! Certainty-equivalent value function
V (t ) =
1 T x (t)S(t)x(t) + v ( t ) 2
VCE ( t ) !
1 T x (t)S(t)x(t) 2
! Certainty-equivalent value function
! Gradient with respect to the state
VCE ( t ) !
1 T x (t)S(t)x(t) 2
!V (t ) = xT (t)S(t) !x
! Hessian with respect to the state
! Stochastic value function increment
v (t ) =
1 tf T Tr "S (! ) L (! ) W (! ) L (! ) $ d! # % 2 &t
! 2V (t ) = S(t) !x 2
Linear-Quadratic Stochastic Hamilton-Jacobi-Bellman Equation

! Certainty-equivalent plus stochastic terms
!V * 1 = " min E # x *T Qx * +2x *T Mu + uT Ru + x *T S ( Fx * +Gu ) + Tr SLWLT % & u !t 2 $ 1 = " min # x *T Qx * +2x *T Mu + uT Ru + x *T S ( Fx * +Gu ) + Tr SLWLT % & u 2$
Optimal Control Law

! Differentiate right side of HJB equation w.r.t. u and set equal to zero
! ( !V !t ) = 0 = " xT M + uT R + xT SG $ # % !u
! Solve for u, obtaining feedback control law
! Terminal condition
u ( t ) = !R !1 ( t ) "GT ( t ) S ( t ) + MT ( t ) $ x ( t ) # % ! !C ( t ) x ( t )
V tf =
( )
1 T x (t f )S(t f )x(t f ) 2
Matrix Riccati Equation LQ Optimal Control Law

u ( t ) = !R !1 ( t ) "GT ( t ) S ( t ) + MT ( t ) $ x ( t ) # % ! !C ( t ) x ( t )
! Substitute optimal control law in HJB equation

1 1 T! ! x Sx + v = xT " !Q + MR !1MT ! F ! GR !1MT 2 # 2 1 + Tr SLWLT 2
) (
S ! S F ! GR !1MT + SGR !1GT S $ x %

u ( t ) = !R !1 ( t ) "GT ( t ) S ( t ) + MT ( t ) $ x ( t ) # %
Zero-mean, white-noise disturbance has no effect on the structure and gains of the LQ feedback control law
! Matrix Riccati equation provides S(t)

T ! S ( t ) = " !Q(t) + M(t)R !1 (t)MT (t) $ ! " F(t) ! G(t)R !1 (t)MT (t) $ S ( t ) # % # %
! S ( t ) " F(t) ! G(t)R !1 (t)MT (t) $ + S ( t ) G(t)R !1 (t)GT (t)S ( t ) , S t f = &xx t f # %
( )
( )
! Stochastic value function increases cost due to disturbance

! However, its calculation is independent of the Riccati equation
! v=
1 Tr SLWLT 2
Evaluation of the Total Cost

(Imperfect Measurements)
! Stochastic quadratic cost function, neglecting cross terms
J= =
or
Optimal Control Covariance

! Optimal control vector
! Q(t) 0 # ! x(t) # , 1 ( * * Tr ) E ! xT (t f )S(t f )x(t f ) # + E ' ! xT (t) uT (t) # % & dt &% $ " $% 0 R(t) $ " u(t) $ * 2 * " & . &% to " +
tf
1 Tr S(t f )E ! x(t f )xT (t f ) # + ' Q(t)E ! x(t)xT (t) # + R(t)E ! u(t)uT (t) # dt " $ " $ " $ 2 to
tf
u ( t ) = !C ( t ) x ( t )
! Optimal control covariance
J=
tf * 1 & ( ( Tr 'S(t f )P(t f ) + % !Q(t)P(t) + R(t)U ( t ) # dt + " $ 2 ( to ( ) ,
U ( t ) = C ( t ) P ( t ) CT ( t )
= R !1 ( t ) GT ( t ) S ( t ) P ( t ) S ( t ) G ( t ) R !1 ( t )
where
U ( t ) ! E ! u(t)uT (t) # " $
P(t) ! E ! x(t)xT (t) # " $
Revise Cost to Reect State and Adjoint Covariance Dynamics

! Integration by parts
Evolution of State and Adjoint Covariance Matrices

(No Control)
u ( t ) = 0; U ( t ) = 0
! State covariance response to random disturbance
! ! S(t)P(t) tof = % !S(t)P(t) + S(t)P(t) # dt " $

t to
tf
! ! S(t f )P(t f ) = S(t o )P(t o ) + % !S(t)P(t) + S(t)P(t) # dt " $

to
tf
! P ( t ) = F ( t ) P ( t ) + P ( t ) FT ( t ) + L ( t ) W ( t ) LT ( t ) , P ( t o ) given
! Adjoint covariance response to terminal cost
! Rewrite cost function to incorporate initial cost

tf * 1 & ( ( ! ! J = Tr 'S(t o )P ( t o ) + % !Q(t)P ( t ) + R(t)U ( t ) + S(t)P(t) + S(t)P(t) # dt + " $ 2 ) to ( ( ,
! S ( t ) = !FT ( t ) S ( t ) ! S ( t ) F ( t ) ! Q ( t ) , S t f
( )
given
Evolution of State and Adjoint Covariance Matrices

(Optimal Control)
! State covariance response to random disturbance
! P ( t ) = " F ( t ) ! G ( t ) C ( t ) $ P ( t ) + P ( t ) " F ( t ) ! G ( t ) C ( t ) $ + L ( t ) W ( t ) LT ( t ) # % # %
T
Total Cost With and Without Control

! With no control
J no control
tf % 1 " = Tr $S(t o )P ( t o ) + ! S ( t ) L ( t ) W ( t ) LT ( t ) dt ' 2 $ ' to & #
Dependent on S(t)
! With optimal control, the equation for the cost is the same
tf % 1 " = Tr $S(t o )P ( t o ) + ! S ( t ) L ( t ) W ( t ) LT ( t ) dt ' 2 # ' $ to &
! Adjoint covariance response to terminal cost
! S ( t ) = !FT ( t ) S ( t ) ! S ( t ) F ( t ) ! Q ( t ) ! S ( t ) G ( t ) R !1 ( t ) GT ( t ) S ( t )
Independent of P(t)
J optimal control
! ... but evolutions of S(t) and S(to) are different in each case
Next Time: Linear-Quadratic-Gaussian Regulators
Supplemental Material
Dual Control
(Fel"dbaum, 1965) ! Nonlinear system
! Uncertain system parameters to be estimated ! Parameter estimation can be aided by test inputs
Adaptive Critic Controller

Nonlinear control law, c, takes the general form
! Approach: Minimize value function with three increments

! Nominal control ! Cautious control ! Probing control
min V* = min V *nominal + V *cautious + V * probing
u
u( t ) = c[ x(t),a,y * ( t )]
On-line adaptive critic controller
x(t) : state a : parameters of operating point y * (t) : command input
Nonlinear control law (action network) Criticizes non-optimal performance via critic network
Adapts control gains to improve performance Adapts cost model to improve estimate
Estimation and control calculations are coupled and necessarily recursive
Algebraic Initialization of Neural Networks

(Ferrari and Stengel) Initially, c[x, a, y*] is unknown Design PI-LQ controllers with integral compensation that satisfy requirements at n operating points Scheduling variable, a
Replace Gain Matrices by Neural Networks

Replace control gain matrices by sigmoidal neural networks
u ( t ) = C F ( a ) y * +C B ( a ) !x + C I ( a ) " !y ( t ) dt # c $ x(t), a, y * ( t ) & % '
u ( t ) = NN F ! y * ( t ) , a ( t ) # + NN B ! x ( t ) , a ( t ) # + NN I ! & %y ( t ) dt , a ( t ) # = c ! x(t), a, y * ( t ) # " $ " $ " $ " $
Initial Neural Control Law

Algebraic training of neural networks produces exact t of linear control gains and trim conditions at n operating points
Interpolation and gain scheduling via neural networks One node per operating point in each neural network
On-line Optimization of Adaptive Critic Neural Network Controller
Critic adapts neural network weights to improve performance using approximate dynamic programming
Heuristic Dynamic Programming Adaptive Critic

Dual Heuristic Programming Adaptive Critic for receding-horizon optimization problem Critic and Action (i.e., Control) networks adapted concurrently LQ-PI cost function applied to nonlinear problem Modied resilient backpropagation for neural network training
Action Network On-line Training

Train action network, at time t, holding the critic parameters xed
xa(t) a(t)
NNA
Aircraft Model
Transition Matrices State Prediction
V [x(t k )] = L[x(t k ),u(tk )] + V [x(t k +1)]

!V !L !V !x = + =0 !u !u !x !u
NNA Target Optimality Condition
Utility Function Derivatives

NNC
!V [x a (t )] = NNC[x a (t ),a (t )] !x a (t )
Target Generation
Critic Network On-line Training

Train critic network, at time t, holding the action parameters xed
xa(t) a(t)
NNA
Aircraft Model
Transition Matrices State Prediction
NNC
Utility Function Derivatives

NNC(old)
NNC Target
Target Cost Gradient
Target Generation

Mae 546 Lecture 23

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Mae 546 Lecture 23

Hochgeladen von

Copyright:

Verfügbare Formate

Stochastic Optimal Control

Nonlinear Systems with Random Inputs and Perfect Measurements

E ! x ( 0 ) # = x ( 0 ); E ! x ( 0 ) % x ( 0 ) # ! x ( 0 ) % x ( 0 ) # " $ " $" $ E ! w ( t ) # = 0; E ! w ( t ) w T (& ) # = W ( t )' ( t % & ) " $ " $

Assume that random disturbance effects are small and additive

Cost Must Be an Expected Value

Stochastic Euler-Lagrange Equations?

min J = ! " x(t f ) $ + & L [ x(t), u(t)] dt # % u(t )

However, the expected value of a deterministic cost function can be minimized

Stochastic Value Function for a Nonlinear System

Rate of Change of the Value Function

dV * = !E { L [ x * (t1 ), u * (t1 )]} dt t =t1

dV * = !L [ x * (t1 ), u * (t1 )] dt t =t1

Incremental Change in the Value Function

Introduction of the Trace

dV * " !V * !V * % ! = E$ + x dt !x ' # !t &

Tr ( ABC ) = Tr ( CAB ) = Tr ( BCA ) Tr xT Qx = Tr xxT Q = Tr QxxT

Toward the Stochastic HJB Equation

Toward the Stochastic HJB Equation

Stochastic Principle of Optimality

Observations of Stochastic Principle of Optimality

The Information Set, I

Information Sets and Expected Cost

! Information available at current time, t1

I [ t o ,t1 ] = {z [ t o ,t1 ], u [ t o ,t1 ]}

I [ t o ,t1 ] = {z [ t o ,t1 ], u [ t o ,t1 ], f ( ) ,Q, R,!}

A Derived Information Set, ID

Additional Derived Information Sets

! ... or they may require processing, e.g.,

! Example of a derived information set

I D [ t o ,t1 ] = {x [ t o ,t1 ], P [ t o ,t1 ], u [ t o ,t1 ]}

Required and Available Information Sets for Optimal Control

Expected Values of State and Control

I !t o ,t f # = I [ t o ,t1 ] + I !t1 ,t f # " $ " $

Dependence of the Stochastic Cost Function on the Information Set

Certainty-Equivalent and Stochastic Incremental Costs

! Expand the state covariance

! Cost function has two parts

! Certainty-equivalent cost ! Stochastic increment cost

tf & * 1 ( ( E ' Tr !S(t f )P t f # + % Tr !QP ( t ) # dt + " $ $ 2 ) " 0 ( ( ,

Expected Cost of the Trajectory

tF ( , * * = E )! " x * (t f ) $ + ' L [ x * (& ), u * (& )] d& # % * * t0 + .

Expected Cost of the Trajectory

! Law of total expectation

! Because the past is established at t1

Separation Property and Certainty Equivalence

Neighboring-Optimal Control with Uncertain Disturbance, Measurement, and Initial Condition

Certainty Equivalence Property

Immune Response Example

Immune Response Example with Optimal Feedback Control

Optimal closed-loop therapy

Stochastic optimal closed-loop therapy

Immune Response with Stochastic Optimal Feedback Control

Immune Response to Random Disturbance with Stochastic Neighboring-Optimal Control

Probability of satisfactory therapy can be maximized by stochastic redesign of controller

Initial control too sluggish to prevent divergence

Quick initial control prevents divergence

Stochastic Principle of Optimality Applied to the Linear-Quadratic (LQ) Problem

Stochastic Linear-Quadratic Optimal Control

! Linear dynamic constraint

! x ( t ) = F(t)x(t) + G(t)u(t) + L(t)w(t)

Components of the LQ Value Function

Value Function Gradient and Hessian

! Certainty-equivalent value function