Stochastic Control Princeton

Stochastic Optimal Control!
Robert Stengel! Optimal Control and Estimation, MAE 546 ! Princeton University, 2013 " !! Nonlinear systems with random inputs and perfect measurements" !! Nonlinear systems with random inputs and imperfect measurements" !! Certainty equivalence and separation" !! Stochastic neighboring-optimal control" !! Linear-quadratic-Gaussian (LQG) control"
Copyright 2013 by Robert Stengel. All rights reserved. For educational use only.! http://www.princeton.edu/~stengel/MAE546.html! http://www.princeton.edu/~stengel/OptConEst.html!
Nonlinear Systems with Random Inputs and Perfect Measurements "

Inputs and initial conditions are uncertain, but the state can be measured without error "
! (t ) = f ! x " x (t ), u (t ), w (t ), t # $ z (t ) = x (t )
E! " x ( 0 )# $ = x (0)
T
E ! " x ( 0 ) % x ( 0 )# $! " x ( 0 ) % x ( 0 )# $
}= 0
T E! " w ( t ) w (% ) # $ = W ( t )& ( t ' % )
E! " w (t )# $=0
Assume that random disturbance effects are small and additive "
! (t ) = f ! x " x (t ), u (t ), t # $ + L (t ) w (t )
Cost Must Be an Expected Value "

! Deterministic cost function cannot be minimized because "
! disturbance effect on state cannot be predicted" ! state and control are random variables"
Stochastic Euler-Lagrange Equations? "

! There is no single optimal trajectory" ! Expected values of Euler-Lagrange necessary conditions may not be well dened"
( &'[ x(t f )] + 1) E " # ! (t f ) $ % = E ) &x , * T
min J = ! " # x(t f ) $ % + & L [ x(t ), u(t )] dt u(t )

to
tf
! However, the expected value of a deterministic cost function can be minimized"

tf ' + ) ) $ min J = E (! " x ( t ) + L x ( t ), u ( t ) dt [ ] , f % & # u(t ) t ) ) o * -
( ' H [ x(t ), u(t ), ! (t ), t ] + ! 2) E " , # ! (t ) $ % = &E ) 'x * # ! H [ x(t ), u(t ), " (t ), t ] & 3) E $ '=0 !u % (
Stochastic Value Function for a Nonlinear System "

! ! ! However, a Hamilton-Jacobi-Bellman (HJB) based on expectations can be solved" Base the optimization on the Principle of Optimality" Optimal expected value function at t1"
Rate of Change of the Value Function "

Total time-derivative of V* "
dV * = ! E { L [ x * (t1 ), u * (t1 )]} dt t = t1

x(t) and u(t) can be known precisely; therefore "
t1 ) + + $ V * ( t1 ) = E *! " x *( t ) & L x *( ' ), u *( ' ) d ' [ ] . f # % ( t + + f , / t 1 ) + + $ = min E *! " x *( t ) & L x *( ' ), u ( ' ) d ' [ ] . f # % ( u tf + + , /
dV * = ! L [ x * (t1 ), u * (t1 )] dt t = t1
Incremental Change in the Value Function "

! Apply chain rule to total derivative"
Introduction of the Trace "

Trace of a matrix product is scalar "
Tr ( xT Qx ) = Tr ( xxT Q) = Tr ( QxxT ) dim ! " Tr ( ) # $ = 1%1
2 * "V * "V * & T " V * dV * 1 # ! E, + f (.) + Lw (.)) + Tr % ( f (.) + Lw (.)) f . + Lw (.))( )t / ( 2 ( ( ) dt "x 2 $ "x ' . + "t
dV * " !V * !V * % ! = E$ + x dt !x ' # !t &

! Incremental change in value function," !V
! Expand to second degree"
Tr ( ABC ) = Tr ( CAB ) = Tr ( BCA )
) "V * , dV * "V * 1 # T "2 V * & 2 ! !t + % x ! ! ( !t + ". !V * = !t = E + !t + x x 2 dt " t " x 2 " x $ ' * 2 ) "V * , & T " V * "V * 1# = E+ !t + f (.) + Lw (.)) !t + % ( f (.) + Lw (.)) f . + Lw (.))( !t 2 + ". ( 2 ( ( ) " t " x 2 " x $ ' * -
* "V * "V * T& 1 # "2 V * = E, + f (.) + Lw (.)) + Tr % f . + Lw (.)) ( f (.) + Lw (.)) ( )t / ( 2 ( ( ) "x 2 $ "x ' . + "t
Cancel !t
Toward the Stochastic HJB Equation "

Because x(t) and u(t) can be measured, "
) !V * !V * , T% dV * 1 " !2 V * = E+ + f (.) + Lw (.)) + Tr $ f . + Lw (.)) ( f (.) + Lw (.)) ' (t . ( 2 ( ( ) dt ! t ! x 2 ! x # & * = ) !V * , T% !V * !V * 1 " !2 V * + f (.) + E + Lw (.) + Tr $ f . + Lw (.)) ( f (.) + Lw (.)) ' (t . 2 ( ( ) !t !x ! x 2 ! x # & * -
Toward the Stochastic HJB Equation "

Disturbance is assumed to be zero-mean white noise "
T E! " w ( t ) w (% ) # $ = W ( t )& ( t ' % )
E! " w (t )# $=0
Uncertain disturbance input can only increase the value function rate of change "
( !2 V * $ + dV * !V * !V * 1 T T = + f (.) + lim Tr ) E f (.) f (.) "t + LE w (.) w (.) LT & "t , ' dt !t !x 2 "t#0 * !x 2 % = & !V * !V * 1 $ !2 V * (t ) + (t ) f (.) + Tr . 2 (t ) L (t ) W (t ) L (t )T / !t !x 2 % !x '
they can be taken outside the expectation "
Stochastic Principle of Optimality"

(Perfect Measurements) "
% dV * !V * !V * 1 " ! 2V * = (t ) + (t ) f (.) + Tr $ 2 (t ) L (t ) W (t ) L (t )T ' dt !t !x 2 # !x &
!! Substitute for total derivative, dV*/dt = L(x*,u*)" !! Solve for the partial derivative, "V*/"t" !! Stochastic HJB Equation"
!V * (t ) = !t ) !V * 1 # !2 V * T %" min E * L # $ x * (t ), u (t ), t % & + !x ( t ) f # $ x * (t ), u (t ), t % & + 2 Tr ' !x 2 ( t ) L ( t ) W ( t ) L ( t ) ( . u , , $ &/ + % Boundary (terminal) condition : V * t f = E # $0 t f &
Observations of Stochastic Principle of Optimality"

!V * (t ) = !t ) !V * 1 # !2 V * T %" min E * L # $ x * (t ), u (t ), t % & + !x ( t ) f # $ x * (t ), u (t ), t % & + 2 Tr ' !x 2 ( t ) L ( t ) W ( t ) L ( t ) ( . u , $ &, + /
!! Control has no effect on the disturbance input" !! Criterion for optimality is the same as for the deterministic case" !! Disturbance uncertainty increases the magnitude of the total optimal value function, V*(0)"
( )
( )
The Information Set, I

!! Sigma algebra(Wikipedia denitions)"
!! The collection of sets over which a measure is dened" !! The collection of events that can be assigned probabilities" !! A measurable space "
Information Sets and Expected Cost!
!! Information available at current time, t1"

!! All measurements from initial time, to" !! All control commands from initial time"
I [ t o , t1 ] = {z [ t o , t1 ], u [ t o , t1 ]}
!! Plus available model structure, parameters, and statistics"
I [ t o , t1 ] = {z [ t o , t1 ], u [ t o , t1 ], f ( ) , Q, R,!}
A Derived Information Set, ID

!! Measurements may be directly useful, e.g.,"
!! Displays" !! Simple feedback control"
Additional Derived Information Sets

!! Markov derived information set"
!! Most current mean and covariance from a state estimator"
!! ... or they may require processing, e.g.,"

!! Transformation" !! Estimation "
( t1 ) , P ( t1 ) , u ( t1 )} I MD ( t1 ) = {x
!! Multiple model derived information set"
!! Parallel estimates of current mean, covariance, and hypothesis probability mass function"
!! Example of a derived information set"

!! History of mean and covariance from a state estimator"
[ t o , t1 ], P [ t o , t1 ], u [ t o , t1 ]} I D [ t o , t1 ] = {x
A ( t1 ) , PA ( t1 ) , u ( t1 ) , Pr ( H A ) # B ( t1 ) , PB ( t1 ) , u ( t1 ) , Pr ( H B ) # I MM ( t1 ) = ! "x $, ! "x $ ,!
Required and Available Information Sets for Optimal Control

!! Optimal control requires propagation of information back from the nal time"
!! Hence, it requires the entire information set, extending from to to tf "
Expected Values of State and Control "

Expected values of the state and control are conditioned on the information set "
I! "t o , t f # $
!! Separate information set into knowable and predictable parts"
(t )# (t )# E ! " x (t ) % x $! " x (t ) % x $ | I D = P (t )
T
I! "t o , t f # $ = I [ t o , t1 ] + I ! "t1 , t f # $
!! !! Knowable information has been received" Predictable information is to come"
(t ) E! " x (t ) | I D # $=x
"... where the conditional expected values are estimates from an optimal lter "
Dependence of the Stochastic Cost Function on the Information Set "

J=
tf tf * 1 & ( ( E 'E ! Tr ! S(t f )x(t f )xT (t f ) # | ID # + % E Tr ! Qx ( t ) xT ( t ) # dt + % E Tr ! Ru ( t ) uT ( t ) # dt + " $ " $ " $ " $ 2 ( 0 0 ( ) ,
Certainty-Equivalent and Stochastic Incremental Costs "

J=
tf tf * 1 & ( ( T (t f )x T (t f ) # + % Tr Q ! (t ) x T (t )# E ' Tr S(t f ) ! P t + x P t + x dt + Tr ! ( ) f " $ " Ru ( t ) u ( t ) # $ dt + % " $ 2 ) ( ( 0 0 ,
Expand the state covariance "
( )
(t ) x (t ) ! x (t ) x (t ) + x (t ) x (t )$ =E " # x (t ) x (t ) ! x % | ID
T T T T
(t )$ (t )$ P (t ) = E " # x (t ) ! x %" # x (t ) ! x % | ID
T
! J CE + J S
}
J CE =
!! Cost function has two parts"

!! Certainty-equivalent cost" !! Stochastic increment cost"
tf tf * 1 & ( ( (t f )x T (t f ) # (t ) x T ( t )} dt + % Tr ! E ' Tr ! S(t f )x + % Tr {Qx Ru ( t ) uT ( t ) # dt + " $ " $ 2 ( ( 0 0 ) ,
T (t )# ( t ) xT ( t ) # (t ) x T (t ) E ! " x (t ) x $ | ID = E ! "x $ | ID = x
} {
(t ) x (t ) P (t ) = E ! " x (t ) x (t )# $ | ID % x or
T T
(t ) x (t ) E ! " x (t ) x (t )# $ | I D = P (t ) + x
T T
"... where the conditional expected values are obtained from an optimal lter "
JS =
tf * 1 & ( ( # + % Tr ! E ' Tr ! S ( t ) P t f f "QP ( t ) # $ dt + " $ 2 ) ( ( 0 ,
( )
Expected Cost of the Trajectory

Optimized cost function "
tF ( , * * $ V * ( t o ) ! J * t f = E )! " x * ( t ) + f # % ' L [ x * (& ), u * (& )] d& * * t0 + .
Expected Cost of the Trajectory

!! For planning or post-trajectory analysis, one can assume that the entire information set is available " !! For real-time control, t1 # tf, and future information set can only be predicted"
( )
Law of total expectation "

E (! ) = E (! | I [ t o , t1 ]) Pr {I [ t o , t1 ]} + E ! | I " #t1 , t f $ % Pr I " #t1 , t f $ % = E" # E (! | I ) $ %
) {
Because the past is established at t1 "

E ( J *) = E ( J * | I [ t o , t1 ])[1] + E J * | I ! "t1 , t f # $ Pr I ! "t1 , t f # $ = E ( J * | I [ t o , t1 ]) + E J * | I ! "t1 , t f # $ Pr I ! "t1 , t f # $
) {
) {
Separation Property and Certainty Equivalence "

! Separation Property"
! Optimal Control Law and Optimal Estimation Law can be derived separately" ! Their derivations are strictly independent"
! Certainty Equivalence Property"

! Separation property plus, ..." ! The Stochastic Optimal Control Law and the Deterministic Optimal Control Law are the same" ! The Optimal Estimation Law can be derived separately"
Stochastic Linear-Quadratic Optimal Control!
! Linear-quadratic-Gaussian (LQG) control is certainty-equivalent"
Stochastic Principle of Optimality Applied to the Linear-Quadratic (LQ) Problem "

Quadratic value function "
to ) + + V ( t o ) = E *! " # x(t f ) $ % & ( L [ x(' ), u(' )] d' . tf + + , /
Components of the LQ Value Function "

Quadratic value function has two parts "
V (t ) =
1 T x (t )S(t )x(t ) + v ( t ) 2
Certainty-equivalent value function "
to " Q(t ) M(t ) $ " x(t ) $ 1 ) + + 10 E *xT (t f )S(t f )x(t f ) & ( " xT (t ) uT (t ) $ 0 1 dt # % 0 MT (t ) R(t ) 1 0 u(t ) 1 . 2 + tf % + # %# , /
VCE ( t ) !
1 T x (t )S(t )x(t ) 2
Linear dynamic constraint "
Stochastic value function increment "

v (t ) = 1 tf T Tr " S (! ) L (! ) W (! ) L (! ) $ & # % d! t 2
! ( t ) = F(t )x(t ) + G(t )u(t ) + L(t )w(t ) x
Value Function Gradient and Hessian "

Certainty-equivalent value function "
Linear-Quadratic Stochastic Hamilton-Jacobi-Bellman Equation "

Certainty-equivalent plus stochastic terms "
!V * 1 = " min E # ( x *T Qx * +2x *T Mu + uT Ru) + x *T S ( Fx * +Gu) + Tr (SLWLT )% & u !t 2 $ 1 T T T T T = " min # ( x * Qx * +2x * Mu + u Ru) + x * S ( Fx * +Gu) + Tr (SLWL )% & u 2$
VCE ( t ) !
1 T x (t )S(t )x(t ) 2
Gradient with respect to the state "
!V (t ) = xT (t )S(t ) !x
Hessian with respect to the state "
Terminal condition "
!V (t ) = S(t ) !x 2
2
V tf =
( )
1 T x (t f )S(t f )x(t f ) 2
Optimal Control Law"

Differentiate right side of HJB equation w.r.t. u and set equal to zero "
LQ Optimal Control Law"

T T u ( t ) = ! R !1 ( t ) " #G ( t ) S ( t ) + M ( t ) $ % x (t )
! ( !V !t ) =0=" xT M + uT R + xT SG $ # % !u
! !C (t ) x (t )
Zero-mean, white-noise disturbance has no effect on the structure and gains of the LQ feedback control law "
Solve for u, obtaining feedback control law"

T T u ( t ) = ! R !1 ( t ) " #G ( t ) S ( t ) + M ( t ) $ % x (t )
! !C (t ) x (t )
Matrix Riccati Equation "

!! Substitute optimal control law in HJB equation"
T 1 T! 1 ! = xT "( !Q + MR !1MT ) ! ( F ! GR !1MT ) S ! S ( F ! GR !1MT ) + SGR !1GT S $ x x Sx + v % 2 2 # T T 1 u ( t ) = ! R !1 ( t ) " #G ( t ) S ( t ) + M ( t ) $ % x (t ) + Tr ( SLWLT ) 2
Evaluation of the Total Cost " (Imperfect Measurements) "

!! Stochastic quadratic cost function, neglecting cross terms "
J= =
or"
tf ! Q(t ) 0 # ! x(t ) # , 1 ( * * Tr ) E ! xT (t f )S(t f )x(t f ) # + E ' ! xT (t ) uT (t ) # % &% & dt " $ " $% 0 R (t ) & 2 * to " $% " u(t ) & $ * + .
!! Matrix Riccati equation provides S(t)"

! ( t ) = " !Q(t ) + M(t )R !1 (t )MT (t ) $ ! " F(t ) ! G(t )R !1 (t )MT (t ) $T S ( t ) S # % # %
!1 T !1 T !S (t ) " # F(t ) ! G(t )R (t )M (t ) $ % + S ( t ) G(t )R (t )G (t )S ( t ) , S t f = &xx t f
1 T T T Tr S(t f )E ! " x(t )x (t ) # $ + R(t )E ! " u(t )u (t ) # $ dt " x(t f )x (t f ) # $ + ' Q(t )E ! 2 to
tf
( )
( )
where"
J=
!! Stochastic value function increases cost due to disturbance"

!! However, its calculation is independent of the Riccati equation"
tf * 1 & ( ( Tr 'S(t f )P(t f ) + % ! Q(t )P(t ) + R(t )U ( t ) # dt + " $ 2 ( t ( o ) ,
!= v
1 Tr SLWLT 2
T P(t ) ! E ! " x(t )x (t ) # $
T U (t ) ! E ! " u(t )u (t ) # $
Optimal Control Covariance "

Optimal control vector "
Revise Cost to Reect State and Adjoint Covariance Dynamics "

!! Integration by parts"
t ! ! S(t )P(t ) tof = % ! "S(t )P(t ) + S(t )P(t ) # $ dt to tf tf
(t ) u (t ) = !C (t ) x
Optimal control covariance "
= R !1 ( t ) GT ( t ) S ( t ) P ( t ) S ( t ) G ( t ) R !1 ( t ) U ( t ) = C ( t ) P ( t ) CT ( t )
! ! S(t f )P(t f ) = S(t o )P(t o ) + % ! "S(t )P(t ) + S(t )P(t ) # $ dt

to
!! Rewrite cost function to incorporate initial cost"

J=
tf * 1 & ( ! (t )P(t ) + S(t )P ! (t ) # dt ( Tr 'S(t o )P ( t o ) + % ! Q(t )P ( t ) + R(t )U ( t ) + S + " $ 2 ( to ( ) ,
Evolution of State and Adjoint Covariance Matrices"

(No Control) "
u ( t ) = 0; U ( t ) = 0
!! State covariance response to random disturbance"
Evolution of State and Adjoint Covariance Matrices"

(Optimal Control) "
!! State covariance response to random disturbance"
! ( t ) = " F ( t ) ! G ( t ) C ( t ) $ P ( t ) + P ( t ) " F ( t ) ! G ( t ) C ( t ) $T + L ( t ) W ( t ) LT ( t ) P # % # %
Dependent on S(t)"
! ( t ) = F ( t ) P ( t ) + P ( t ) FT ( t ) + L ( t ) W ( t ) LT ( t ) , P ( t o ) given P
!! Adjoint covariance response to terminal cost"
!! Adjoint covariance response to terminal cost"
! (t ) = ! F (t ) S (t ) ! S (t ) F (t ) ! Q (t ), S t f S
T
( )
! ( t ) = ! FT ( t ) S ( t ) ! S ( t ) F ( t ) ! Q ( t ) ! S ( t ) G ( t ) R !1 ( t ) GT ( t ) S ( t ) S
Independent of P(t)"
given
Total Cost With and Without Control "

!! With no control"
J no control
tf % 1 " = Tr $S(t o )P ( t o ) + ! S ( t ) L ( t ) W ( t ) LT ( t ) dt ' 2 $ ' to # &
!! With optimal control, the equation for the cost is the same"
tf % 1 " = Tr $S(t o )P ( t o ) + ! S ( t ) L ( t ) W ( t ) LT ( t ) dt ' 2 $ ' to # &
Next Time:! Linear-Quadratic-Gaussian Regulators!
J optimal control
!! ... but evolutions of S(t) and S(to) are different in each case"
Supplemental Material
Neighboring-Optimal Control with Uncertain Disturbance, Measurement, and Initial Condition!
Immune Response Example "

!! Optimal open-loop drug therapy (control)"
!! Assumptions "
!! Initial condition known without error" !! No disturbance "
Open-Loop Optimal Control for Lethal Initial Condition"
Immune Response Example with Optimal Feedback Control "

Open- and Closed-Loop Optimal Control for 150% Lethal Initial Condition"
!! Optimal closed-loop therapy"

!! Assumptions"
!! Small error in initial condition" !! Small disturbance" !! Perfect measurement of state"
!! Stochastic optimal closed-loop therapy"

!! Assumptions"
!! !! !! !! Small error in initial condition" Small disturbance" Imperfect measurement" Certainty-equivalence applies to perturbation control"
Immune Response with Full-State Stochastic Optimal Feedback Control " (Random Disturbance and Measurement Error Not Simulated) "
Low-Bandwidth Estimator (|W| < |N|) " High-Bandwidth Estimator (|W| > |N|) "
Stochastic-Optimal Control (u1) with Two Measurements (x1, x3)!

(w/Ghigliazza, 2004) "
W = I4" N = I2 / 20"
!!
Initial control too sluggish to prevent divergence"
!!
Quick initial control prevents divergence"
Immune Response to Random Disturbance with Two-Measurement Stochastic Neighboring-Optimal Control "
! Disturbance due to"
! Re-infection" ! Sequestered pockets of pathogen"
Dual Control "

(Fel dbaum, 1965) !! Nonlinear system"
!! Uncertain system parameters to be estimated" !! Parameter estimation can be aided by test inputs"
!! Approach: Minimize value function with three increments"

!! Nominal control" !! Cautious control" !! Probing control"
min V * = min V *nominal + V *cautious + V * probing
u
! ! !
Noisy measurements" Closed-loop therapy is robust " ... but not robust enough:"
! Organ death occurs in one case"
Probability of satisfactory therapy can be maximized by stochastic redesign of controller"
! Estimation and control calculations are coupled and necessarily recursive"
Algebraic Initialization of Neural Networks ! (Ferrari and Stengel) "
Adaptive Critic Controller "

! Nonlinear control law, c, takes the general form"
u( t ) = c[ x ( t ), a, y * ( t )]
! On-line adaptive critic controller"
x ( t ) : state a : parameters of operating point y * ( t ) : command input
! Initially, c[x, a, y*] is unknown" ! Design PI-LQ controllers with integral compensation that satisfy requirements at n operating points " ! Scheduling variable, a"
! Nonlinear control law ( action network )" ! Criticizes non-optimal performance via critic network "
! Adapts control gains to improve performance" ! Adapts cost model to improve estimate"
u ( t ) = C F ( a ) y * +C B ( a ) !x + C I ( a ) " !y ( t ) dt # c $ % x(t ), a, y * ( t ) & '
Replace Gain Matrices by Neural Networks "

! Replace control gain matrices by sigmoidal neural networks"
Initial Neural Control Law "

! Algebraic training of neural networks produces exact t of linear control gains and trim conditions at n operating points "
! Interpolation and gain scheduling via neural networks" ! One node per operating point in each neural network"
# u ( t ) = NN F ! " y * (t ), a (t )# $ + NN B ! " x (t ), a (t )# $ + NN I ! " x(t ), a, y * ( t ) # $ " & %y ( t ) dt , a ( t ) $ = c !
On-line Optimization of Adaptive Critic Neural Network Controller
Heuristic Dynamic Programming Adaptive Critic "

! Dual Heuristic Programming Adaptive Critic for receding-horizon optimization problem" ! Critic and Action (i.e., Control) networks adapted concurrently" ! LQ-PI cost function applied to nonlinear problem" ! Modied resilient backpropagation for neural network training"
V [x(t k )] = L[x(t k ), u(tk )] + V [x(t k +1)]

!V !L !V !x = + =0 ! u ! u !x ! u
! Critic adapts neural network weights to improve performance using approximate dynamic programming"
!V [x a (t )] = NNC[x a (t ), a (t )] !x a ( t )
Action Network On-line Training "

Train action network, at time t, holding the critic parameters xed"
Critic Network On-line Training"

Train critic network, at time t, holding the action parameters xed"
xa(t) a(t)
NNA
Aircraft Model
! Transition Matrices ! State Prediction
xa(t) a(t)
NNA
Aircraft Model
! Transition Matrices ! State Prediction
Utility Function Derivatives NNA Target Optimality Condition

NNC
NNC
Utility Function Derivatives

NNC(old)
NNC Target
Target Cost Gradient
Target Generation
Target Generation

Stochastic Control Princeton

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Stochastic Control Princeton

Hochgeladen von

Copyright:

Verfügbare Formate

Stochastic Optimal Control!

Nonlinear Systems with Random Inputs and Perfect Measurements "

T E! " w ( t ) w (% ) # $ = W ( t )& ( t ' % )

Cost Must Be an Expected Value "

Stochastic Euler-Lagrange Equations? "

min J = ! " # x(t f ) $ % + & L [ x(t ), u(t )] dt u(t )

! However, the expected value of a deterministic cost function can be minimized"

Stochastic Value Function for a Nonlinear System "

Rate of Change of the Value Function "

dV * = ! E { L [ x * (t1 ), u * (t1 )]} dt t = t1

Incremental Change in the Value Function "

Introduction of the Trace "

dV * " !V * !V * % ! = E$ + x dt !x ' # !t &

Tr ( ABC ) = Tr ( CAB ) = Tr ( BCA )

Toward the Stochastic HJB Equation "

Toward the Stochastic HJB Equation "

they can be taken outside the expectation "

Stochastic Principle of Optimality"

Observations of Stochastic Principle of Optimality"

The Information Set, I

Information Sets and Expected Cost!

!! Information available at current time, t1"

A Derived Information Set, ID

Additional Derived Information Sets

!! ... or they may require processing, e.g.,"

!! Example of a derived information set"

Required and Available Information Sets for Optimal Control

Expected Values of State and Control "

Dependence of the Stochastic Cost Function on the Information Set "

Certainty-Equivalent and Stochastic Incremental Costs "

Expand the state covariance "

!! Cost function has two parts"

tf * 1 & ( ( # + % Tr ! E ' Tr ! S ( t ) P t f f "QP ( t ) # $ dt + " $ 2 ) ( ( 0 ,

Expected Cost of the Trajectory

Expected Cost of the Trajectory

Law of total expectation "

Because the past is established at t1 "

Separation Property and Certainty Equivalence "

! Certainty Equivalence Property"

Stochastic Linear-Quadratic Optimal Control!

! Linear-quadratic-Gaussian (LQG) control is certainty-equivalent"

Stochastic Principle of Optimality Applied to the Linear-Quadratic (LQ) Problem "

Components of the LQ Value Function "

Certainty-equivalent value function "

Linear dynamic constraint "

Stochastic value function increment "

! ( t ) = F(t )x(t ) + G(t )u(t ) + L(t )w(t ) x

Value Function Gradient and Hessian "

Linear-Quadratic Stochastic Hamilton-Jacobi-Bellman Equation "

Gradient with respect to the state "

Terminal condition "

Optimal Control Law"

LQ Optimal Control Law"

Solve for u, obtaining feedback control law"

Matrix Riccati Equation "

Evaluation of the Total Cost " (Imperfect Measurements) "

!! Matrix Riccati equation provides S(t)"

!! Stochastic value function increases cost due to disturbance"

tf * 1 & ( ( Tr 'S(t f )P(t f ) + % ! Q(t )P(t ) + R(t )U ( t ) # dt + " $ 2 ( t ( o ) ,

T P(t ) ! E ! " x(t )x (t ) # $

Optimal Control Covariance "

Revise Cost to Reect State and Adjoint Covariance Dynamics "

! ! S(t f )P(t f ) = S(t o )P(t o ) + % ! "S(t )P(t ) + S(t )P(t ) # $ dt

!! Rewrite cost function to incorporate initial cost"