Sie sind auf Seite 1von 14

Stochastic Optimal Control!

Robert Stengel! Optimal Control and Estimation, MAE 546 ! Princeton University, 2013 " !! Nonlinear systems with random inputs and perfect measurements" !! Nonlinear systems with random inputs and imperfect measurements" !! Certainty equivalence and separation" !! Stochastic neighboring-optimal control" !! Linear-quadratic-Gaussian (LQG) control"
Copyright 2013 by Robert Stengel. All rights reserved. For educational use only.! http://www.princeton.edu/~stengel/MAE546.html! http://www.princeton.edu/~stengel/OptConEst.html!

Nonlinear Systems with Random Inputs and Perfect Measurements "


Inputs and initial conditions are uncertain, but the state can be measured without error "

! (t ) = f ! x " x (t ), u (t ), w (t ), t # $ z (t ) = x (t )
E! " x ( 0 )# $ = x (0)
T

E ! " x ( 0 ) % x ( 0 )# $! " x ( 0 ) % x ( 0 )# $

}= 0

T E! " w ( t ) w (% ) # $ = W ( t )& ( t ' % )

E! " w (t )# $=0

Assume that random disturbance effects are small and additive "

! (t ) = f ! x " x (t ), u (t ), t # $ + L (t ) w (t )

Cost Must Be an Expected Value "


! Deterministic cost function cannot be minimized because "
! disturbance effect on state cannot be predicted" ! state and control are random variables"

Stochastic Euler-Lagrange Equations? "


! There is no single optimal trajectory" ! Expected values of Euler-Lagrange necessary conditions may not be well dened"
( &'[ x(t f )] + 1) E " # ! (t f ) $ % = E ) &x , * T

min J = ! " # x(t f ) $ % + & L [ x(t ), u(t )] dt u(t )


to

tf

! However, the expected value of a deterministic cost function can be minimized"


tf ' + ) ) $ min J = E (! " x ( t ) + L x ( t ), u ( t ) dt [ ] , f % & # u(t ) t ) ) o * -

( ' H [ x(t ), u(t ), ! (t ), t ] + ! 2) E " , # ! (t ) $ % = &E ) 'x * # ! H [ x(t ), u(t ), " (t ), t ] & 3) E $ '=0 !u % (

Stochastic Value Function for a Nonlinear System "


! ! ! However, a Hamilton-Jacobi-Bellman (HJB) based on expectations can be solved" Base the optimization on the Principle of Optimality" Optimal expected value function at t1"

Rate of Change of the Value Function "


Total time-derivative of V* "

dV * = ! E { L [ x * (t1 ), u * (t1 )]} dt t = t1


x(t) and u(t) can be known precisely; therefore "

t1 ) + + $ V * ( t1 ) = E *! " x *( t ) & L x *( ' ), u *( ' ) d ' [ ] . f # % ( t + + f , / t 1 ) + + $ = min E *! " x *( t ) & L x *( ' ), u ( ' ) d ' [ ] . f # % ( u tf + + , /

dV * = ! L [ x * (t1 ), u * (t1 )] dt t = t1

Incremental Change in the Value Function "


! Apply chain rule to total derivative"

Introduction of the Trace "


Trace of a matrix product is scalar "
Tr ( xT Qx ) = Tr ( xxT Q) = Tr ( QxxT ) dim ! " Tr ( ) # $ = 1%1
2 * "V * "V * & T " V * dV * 1 # ! E, + f (.) + Lw (.)) + Tr % ( f (.) + Lw (.)) f . + Lw (.))( )t / ( 2 ( ( ) dt "x 2 $ "x ' . + "t

dV * " !V * !V * % ! = E$ + x dt !x ' # !t &


! Incremental change in value function," !V
! Expand to second degree"

Tr ( ABC ) = Tr ( CAB ) = Tr ( BCA )

) "V * , dV * "V * 1 # T "2 V * & 2 ! !t + % x ! ! ( !t + ". !V * = !t = E + !t + x x 2 dt " t " x 2 " x $ ' * 2 ) "V * , & T " V * "V * 1# = E+ !t + f (.) + Lw (.)) !t + % ( f (.) + Lw (.)) f . + Lw (.))( !t 2 + ". ( 2 ( ( ) " t " x 2 " x $ ' * -

* "V * "V * T& 1 # "2 V * = E, + f (.) + Lw (.)) + Tr % f . + Lw (.)) ( f (.) + Lw (.)) ( )t / ( 2 ( ( ) "x 2 $ "x ' . + "t

Cancel !t

Toward the Stochastic HJB Equation "


Because x(t) and u(t) can be measured, "
) !V * !V * , T% dV * 1 " !2 V * = E+ + f (.) + Lw (.)) + Tr $ f . + Lw (.)) ( f (.) + Lw (.)) ' (t . ( 2 ( ( ) dt ! t ! x 2 ! x # & * = ) !V * , T% !V * !V * 1 " !2 V * + f (.) + E + Lw (.) + Tr $ f . + Lw (.)) ( f (.) + Lw (.)) ' (t . 2 ( ( ) !t !x ! x 2 ! x # & * -

Toward the Stochastic HJB Equation "


Disturbance is assumed to be zero-mean white noise "
T E! " w ( t ) w (% ) # $ = W ( t )& ( t ' % )

E! " w (t )# $=0

Uncertain disturbance input can only increase the value function rate of change "
( !2 V * $ + dV * !V * !V * 1 T T = + f (.) + lim Tr ) E f (.) f (.) "t + LE w (.) w (.) LT & "t , ' dt !t !x 2 "t#0 * !x 2 % = & !V * !V * 1 $ !2 V * (t ) + (t ) f (.) + Tr . 2 (t ) L (t ) W (t ) L (t )T / !t !x 2 % !x '

they can be taken outside the expectation "

Stochastic Principle of Optimality"


(Perfect Measurements) "
% dV * !V * !V * 1 " ! 2V * = (t ) + (t ) f (.) + Tr $ 2 (t ) L (t ) W (t ) L (t )T ' dt !t !x 2 # !x &
!! Substitute for total derivative, dV*/dt = L(x*,u*)" !! Solve for the partial derivative, "V*/"t" !! Stochastic HJB Equation"
!V * (t ) = !t ) !V * 1 # !2 V * T %" min E * L # $ x * (t ), u (t ), t % & + !x ( t ) f # $ x * (t ), u (t ), t % & + 2 Tr ' !x 2 ( t ) L ( t ) W ( t ) L ( t ) ( . u , , $ &/ + % Boundary (terminal) condition : V * t f = E # $0 t f &

Observations of Stochastic Principle of Optimality"


(Perfect Measurements) "
!V * (t ) = !t ) !V * 1 # !2 V * T %" min E * L # $ x * (t ), u (t ), t % & + !x ( t ) f # $ x * (t ), u (t ), t % & + 2 Tr ' !x 2 ( t ) L ( t ) W ( t ) L ( t ) ( . u , $ &, + /

!! Control has no effect on the disturbance input" !! Criterion for optimality is the same as for the deterministic case" !! Disturbance uncertainty increases the magnitude of the total optimal value function, V*(0)"

( )

( )

The Information Set, I


!! Sigma algebra(Wikipedia denitions)"
!! The collection of sets over which a measure is dened" !! The collection of events that can be assigned probabilities" !! A measurable space "

Information Sets and Expected Cost!

!! Information available at current time, t1"


!! All measurements from initial time, to" !! All control commands from initial time"

I [ t o , t1 ] = {z [ t o , t1 ], u [ t o , t1 ]}
!! Plus available model structure, parameters, and statistics"

I [ t o , t1 ] = {z [ t o , t1 ], u [ t o , t1 ], f ( ) , Q, R,!}

A Derived Information Set, ID


!! Measurements may be directly useful, e.g.,"
!! Displays" !! Simple feedback control"

Additional Derived Information Sets


!! Markov derived information set"
!! Most current mean and covariance from a state estimator"

!! ... or they may require processing, e.g.,"


!! Transformation" !! Estimation "

( t1 ) , P ( t1 ) , u ( t1 )} I MD ( t1 ) = {x
!! Multiple model derived information set"
!! Parallel estimates of current mean, covariance, and hypothesis probability mass function"

!! Example of a derived information set"


!! History of mean and covariance from a state estimator"

[ t o , t1 ], P [ t o , t1 ], u [ t o , t1 ]} I D [ t o , t1 ] = {x

A ( t1 ) , PA ( t1 ) , u ( t1 ) , Pr ( H A ) # B ( t1 ) , PB ( t1 ) , u ( t1 ) , Pr ( H B ) # I MM ( t1 ) = ! "x $, ! "x $ ,!

Required and Available Information Sets for Optimal Control


!! Optimal control requires propagation of information back from the nal time"
!! Hence, it requires the entire information set, extending from to to tf "

Expected Values of State and Control "


Expected values of the state and control are conditioned on the information set "

I! "t o , t f # $
!! Separate information set into knowable and predictable parts"

(t )# (t )# E ! " x (t ) % x $! " x (t ) % x $ | I D = P (t )
T

I! "t o , t f # $ = I [ t o , t1 ] + I ! "t1 , t f # $
!! !! Knowable information has been received" Predictable information is to come"

(t ) E! " x (t ) | I D # $=x

"... where the conditional expected values are estimates from an optimal lter "

Dependence of the Stochastic Cost Function on the Information Set "


J=
tf tf * 1 & ( ( E 'E ! Tr ! S(t f )x(t f )xT (t f ) # | ID # + % E Tr ! Qx ( t ) xT ( t ) # dt + % E Tr ! Ru ( t ) uT ( t ) # dt + " $ " $ " $ " $ 2 ( 0 0 ( ) ,

Certainty-Equivalent and Stochastic Incremental Costs "


J=
tf tf * 1 & ( ( T (t f )x T (t f ) # + % Tr Q ! (t ) x T (t )# E ' Tr S(t f ) ! P t + x P t + x dt + Tr ! ( ) f " $ " Ru ( t ) u ( t ) # $ dt + % " $ 2 ) ( ( 0 0 ,

Expand the state covariance "

( )

(t ) x (t ) ! x (t ) x (t ) + x (t ) x (t )$ =E " # x (t ) x (t ) ! x % | ID
T T T T

(t )$ (t )$ P (t ) = E " # x (t ) ! x %" # x (t ) ! x % | ID
T

! J CE + J S

}
J CE =

!! Cost function has two parts"


!! Certainty-equivalent cost" !! Stochastic increment cost"
tf tf * 1 & ( ( (t f )x T (t f ) # (t ) x T ( t )} dt + % Tr ! E ' Tr ! S(t f )x + % Tr {Qx Ru ( t ) uT ( t ) # dt + " $ " $ 2 ( ( 0 0 ) ,

T (t )# ( t ) xT ( t ) # (t ) x T (t ) E ! " x (t ) x $ | ID = E ! "x $ | ID = x

} {

(t ) x (t ) P (t ) = E ! " x (t ) x (t )# $ | ID % x or
T T

(t ) x (t ) E ! " x (t ) x (t )# $ | I D = P (t ) + x
T T

"... where the conditional expected values are obtained from an optimal lter "

JS =

tf * 1 & ( ( # + % Tr ! E ' Tr ! S ( t ) P t f f "QP ( t ) # $ dt + " $ 2 ) ( ( 0 ,

( )

Expected Cost of the Trajectory


Optimized cost function "
tF ( , * * $ V * ( t o ) ! J * t f = E )! " x * ( t ) + f # % ' L [ x * (& ), u * (& )] d& * * t0 + .

Expected Cost of the Trajectory


!! For planning or post-trajectory analysis, one can assume that the entire information set is available " !! For real-time control, t1 # tf, and future information set can only be predicted"

( )

Law of total expectation "


E (! ) = E (! | I [ t o , t1 ]) Pr {I [ t o , t1 ]} + E ! | I " #t1 , t f $ % Pr I " #t1 , t f $ % = E" # E (! | I ) $ %

) {

Because the past is established at t1 "


E ( J *) = E ( J * | I [ t o , t1 ])[1] + E J * | I ! "t1 , t f # $ Pr I ! "t1 , t f # $ = E ( J * | I [ t o , t1 ]) + E J * | I ! "t1 , t f # $ Pr I ! "t1 , t f # $

) {

) {

Separation Property and Certainty Equivalence "


! Separation Property"
! Optimal Control Law and Optimal Estimation Law can be derived separately" ! Their derivations are strictly independent"

! Certainty Equivalence Property"


! Separation property plus, ..." ! The Stochastic Optimal Control Law and the Deterministic Optimal Control Law are the same" ! The Optimal Estimation Law can be derived separately"

Stochastic Linear-Quadratic Optimal Control!

! Linear-quadratic-Gaussian (LQG) control is certainty-equivalent"

Stochastic Principle of Optimality Applied to the Linear-Quadratic (LQ) Problem "


Quadratic value function "
to ) + + V ( t o ) = E *! " # x(t f ) $ % & ( L [ x(' ), u(' )] d' . tf + + , /

Components of the LQ Value Function "


Quadratic value function has two parts "

V (t ) =

1 T x (t )S(t )x(t ) + v ( t ) 2

Certainty-equivalent value function "

to " Q(t ) M(t ) $ " x(t ) $ 1 ) + + 10 E *xT (t f )S(t f )x(t f ) & ( " xT (t ) uT (t ) $ 0 1 dt # % 0 MT (t ) R(t ) 1 0 u(t ) 1 . 2 + tf % + # %# , /

VCE ( t ) !

1 T x (t )S(t )x(t ) 2

Linear dynamic constraint "

Stochastic value function increment "


v (t ) = 1 tf T Tr " S (! ) L (! ) W (! ) L (! ) $ & # % d! t 2

! ( t ) = F(t )x(t ) + G(t )u(t ) + L(t )w(t ) x

Value Function Gradient and Hessian "


Certainty-equivalent value function "

Linear-Quadratic Stochastic Hamilton-Jacobi-Bellman Equation "


(Perfect Measurements) "
Certainty-equivalent plus stochastic terms "
!V * 1 = " min E # ( x *T Qx * +2x *T Mu + uT Ru) + x *T S ( Fx * +Gu) + Tr (SLWLT )% & u !t 2 $ 1 T T T T T = " min # ( x * Qx * +2x * Mu + u Ru) + x * S ( Fx * +Gu) + Tr (SLWL )% & u 2$

VCE ( t ) !

1 T x (t )S(t )x(t ) 2

Gradient with respect to the state "

!V (t ) = xT (t )S(t ) !x
Hessian with respect to the state "

Terminal condition "

!V (t ) = S(t ) !x 2
2

V tf =

( )

1 T x (t f )S(t f )x(t f ) 2

Optimal Control Law"


Differentiate right side of HJB equation w.r.t. u and set equal to zero "

LQ Optimal Control Law"


T T u ( t ) = ! R !1 ( t ) " #G ( t ) S ( t ) + M ( t ) $ % x (t )

! ( !V !t ) =0=" xT M + uT R + xT SG $ # % !u

! !C (t ) x (t )

Zero-mean, white-noise disturbance has no effect on the structure and gains of the LQ feedback control law "

Solve for u, obtaining feedback control law"


T T u ( t ) = ! R !1 ( t ) " #G ( t ) S ( t ) + M ( t ) $ % x (t )

! !C (t ) x (t )

Matrix Riccati Equation "


!! Substitute optimal control law in HJB equation"
T 1 T! 1 ! = xT "( !Q + MR !1MT ) ! ( F ! GR !1MT ) S ! S ( F ! GR !1MT ) + SGR !1GT S $ x x Sx + v % 2 2 # T T 1 u ( t ) = ! R !1 ( t ) " #G ( t ) S ( t ) + M ( t ) $ % x (t ) + Tr ( SLWLT ) 2

Evaluation of the Total Cost " (Imperfect Measurements) "


!! Stochastic quadratic cost function, neglecting cross terms "
J= =
or"
tf ! Q(t ) 0 # ! x(t ) # , 1 ( * * Tr ) E ! xT (t f )S(t f )x(t f ) # + E ' ! xT (t ) uT (t ) # % &% & dt " $ " $% 0 R (t ) & 2 * to " $% " u(t ) & $ * + .

!! Matrix Riccati equation provides S(t)"


! ( t ) = " !Q(t ) + M(t )R !1 (t )MT (t ) $ ! " F(t ) ! G(t )R !1 (t )MT (t ) $T S ( t ) S # % # %
!1 T !1 T !S (t ) " # F(t ) ! G(t )R (t )M (t ) $ % + S ( t ) G(t )R (t )G (t )S ( t ) , S t f = &xx t f

1 T T T Tr S(t f )E ! " x(t )x (t ) # $ + R(t )E ! " u(t )u (t ) # $ dt " x(t f )x (t f ) # $ + ' Q(t )E ! 2 to

tf

( )

( )
where"

J=

!! Stochastic value function increases cost due to disturbance"


!! However, its calculation is independent of the Riccati equation"

tf * 1 & ( ( Tr 'S(t f )P(t f ) + % ! Q(t )P(t ) + R(t )U ( t ) # dt + " $ 2 ( t ( o ) ,

!= v

1 Tr SLWLT 2

T P(t ) ! E ! " x(t )x (t ) # $

T U (t ) ! E ! " u(t )u (t ) # $

Optimal Control Covariance "


Optimal control vector "

Revise Cost to Reect State and Adjoint Covariance Dynamics "


!! Integration by parts"
t ! ! S(t )P(t ) tof = % ! "S(t )P(t ) + S(t )P(t ) # $ dt to tf tf

(t ) u (t ) = !C (t ) x
Optimal control covariance "
= R !1 ( t ) GT ( t ) S ( t ) P ( t ) S ( t ) G ( t ) R !1 ( t ) U ( t ) = C ( t ) P ( t ) CT ( t )

! ! S(t f )P(t f ) = S(t o )P(t o ) + % ! "S(t )P(t ) + S(t )P(t ) # $ dt


to

!! Rewrite cost function to incorporate initial cost"


J=
tf * 1 & ( ! (t )P(t ) + S(t )P ! (t ) # dt ( Tr 'S(t o )P ( t o ) + % ! Q(t )P ( t ) + R(t )U ( t ) + S + " $ 2 ( to ( ) ,

Evolution of State and Adjoint Covariance Matrices"


(No Control) "
u ( t ) = 0; U ( t ) = 0
!! State covariance response to random disturbance"

Evolution of State and Adjoint Covariance Matrices"


(Optimal Control) "
!! State covariance response to random disturbance"
! ( t ) = " F ( t ) ! G ( t ) C ( t ) $ P ( t ) + P ( t ) " F ( t ) ! G ( t ) C ( t ) $T + L ( t ) W ( t ) LT ( t ) P # % # %
Dependent on S(t)"

! ( t ) = F ( t ) P ( t ) + P ( t ) FT ( t ) + L ( t ) W ( t ) LT ( t ) , P ( t o ) given P
!! Adjoint covariance response to terminal cost"

!! Adjoint covariance response to terminal cost"

! (t ) = ! F (t ) S (t ) ! S (t ) F (t ) ! Q (t ), S t f S
T

( )

! ( t ) = ! FT ( t ) S ( t ) ! S ( t ) F ( t ) ! Q ( t ) ! S ( t ) G ( t ) R !1 ( t ) GT ( t ) S ( t ) S
Independent of P(t)"

given

Total Cost With and Without Control "


!! With no control"

J no control

tf % 1 " = Tr $S(t o )P ( t o ) + ! S ( t ) L ( t ) W ( t ) LT ( t ) dt ' 2 $ ' to # &

!! With optimal control, the equation for the cost is the same"
tf % 1 " = Tr $S(t o )P ( t o ) + ! S ( t ) L ( t ) W ( t ) LT ( t ) dt ' 2 $ ' to # &

Next Time:! Linear-Quadratic-Gaussian Regulators!

J optimal control

!! ... but evolutions of S(t) and S(to) are different in each case"

Supplemental Material

Neighboring-Optimal Control with Uncertain Disturbance, Measurement, and Initial Condition!

Immune Response Example "


!! Optimal open-loop drug therapy (control)"
!! Assumptions "
!! Initial condition known without error" !! No disturbance "
Open-Loop Optimal Control for Lethal Initial Condition"

Immune Response Example with Optimal Feedback Control "


Open- and Closed-Loop Optimal Control for 150% Lethal Initial Condition"

!! Optimal closed-loop therapy"


!! Assumptions"
!! Small error in initial condition" !! Small disturbance" !! Perfect measurement of state"

!! Stochastic optimal closed-loop therapy"


!! Assumptions"
!! !! !! !! Small error in initial condition" Small disturbance" Imperfect measurement" Certainty-equivalence applies to perturbation control"

Immune Response with Full-State Stochastic Optimal Feedback Control " (Random Disturbance and Measurement Error Not Simulated) "
Low-Bandwidth Estimator (|W| < |N|) " High-Bandwidth Estimator (|W| > |N|) "

Stochastic-Optimal Control (u1) with Two Measurements (x1, x3)!


(w/Ghigliazza, 2004) "

W = I4" N = I2 / 20"

!!

Initial control too sluggish to prevent divergence"

!!

Quick initial control prevents divergence"

Immune Response to Random Disturbance with Two-Measurement Stochastic Neighboring-Optimal Control "
! Disturbance due to"
! Re-infection" ! Sequestered pockets of pathogen"

Dual Control "


(Fel dbaum, 1965) !! Nonlinear system"
!! Uncertain system parameters to be estimated" !! Parameter estimation can be aided by test inputs"

!! Approach: Minimize value function with three increments"


!! Nominal control" !! Cautious control" !! Probing control"
min V * = min V *nominal + V *cautious + V * probing
u

! ! !

Noisy measurements" Closed-loop therapy is robust " ... but not robust enough:"
! Organ death occurs in one case"

Probability of satisfactory therapy can be maximized by stochastic redesign of controller"

! Estimation and control calculations are coupled and necessarily recursive"

Algebraic Initialization of Neural Networks ! (Ferrari and Stengel) "

Adaptive Critic Controller "


! Nonlinear control law, c, takes the general form"

u( t ) = c[ x ( t ), a, y * ( t )]
! On-line adaptive critic controller"

x ( t ) : state a : parameters of operating point y * ( t ) : command input

! Initially, c[x, a, y*] is unknown" ! Design PI-LQ controllers with integral compensation that satisfy requirements at n operating points " ! Scheduling variable, a"

! Nonlinear control law ( action network )" ! Criticizes non-optimal performance via critic network "
! Adapts control gains to improve performance" ! Adapts cost model to improve estimate"

u ( t ) = C F ( a ) y * +C B ( a ) !x + C I ( a ) " !y ( t ) dt # c $ % x(t ), a, y * ( t ) & '

Replace Gain Matrices by Neural Networks "


! Replace control gain matrices by sigmoidal neural networks"

Initial Neural Control Law "


! Algebraic training of neural networks produces exact t of linear control gains and trim conditions at n operating points "
! Interpolation and gain scheduling via neural networks" ! One node per operating point in each neural network"

# u ( t ) = NN F ! " y * (t ), a (t )# $ + NN B ! " x (t ), a (t )# $ + NN I ! " x(t ), a, y * ( t ) # $ " & %y ( t ) dt , a ( t ) $ = c !

On-line Optimization of Adaptive Critic Neural Network Controller

Heuristic Dynamic Programming Adaptive Critic "


! Dual Heuristic Programming Adaptive Critic for receding-horizon optimization problem" ! Critic and Action (i.e., Control) networks adapted concurrently" ! LQ-PI cost function applied to nonlinear problem" ! Modied resilient backpropagation for neural network training"

V [x(t k )] = L[x(t k ), u(tk )] + V [x(t k +1)]


!V !L !V !x = + =0 ! u ! u !x ! u
! Critic adapts neural network weights to improve performance using approximate dynamic programming"

!V [x a (t )] = NNC[x a (t ), a (t )] !x a ( t )

Action Network On-line Training "


Train action network, at time t, holding the critic parameters xed"

Critic Network On-line Training"


Train critic network, at time t, holding the action parameters xed"

xa(t) a(t)
NNA

Aircraft Model
! Transition Matrices ! State Prediction

xa(t) a(t)
NNA

Aircraft Model
! Transition Matrices ! State Prediction

Utility Function Derivatives NNA Target Optimality Condition


NNC

NNC

Utility Function Derivatives


NNC(old)

NNC Target

Target Cost Gradient

Target Generation

Target Generation

Das könnte Ihnen auch gefallen