MPC Vs DP

2011 11th International Conference on Control, Automation and Systems Oct.
26-29, 2011 in KINTEX, Gyeonggi-do, Korea
Model Predictive Control and Dynamic Programming

Jay H. Lee
Department of Chemical and Biomolecular Engineering, KAIST, Daejeon, 305-701, Korea (Tel : +82-42-385-3926; E-mail: jayhlee@kaist.ac.kr)
Abstract: Model Predictive Control (MPC) and Dynamic Programming (DP) are two different methods to obtain an optimal feedback control law. The former uses on-line optimization to solve an open-loop optimal control problem cast over a finite size time window at each sample time. A feedback control law is defined implicitly by repeating the optimization calculation after a feedback update of the state at each sample time. In contrast, the latter attempts to derive an explicit feedback law off-line by deriving and solving so called Bellmans optimality equation. Both have been used successfully to solve optimal control problems, the former for constrained control problems and the latter for unconstrained linear quadratic optimal control problem. In this paper, we examine the differences and similarities as well as their relative merits and demerits. We also propose ways to integrate the two methods to alleviate each others shortcomings. Keywords: Model Predictive Control, Dynamic Programming, Optimal Feedback Control, Stochastic System Control.
1. INTRODUCTION
Dynamic Programming (DP) and Model Predictive Control (MPC) are two representative methods to solve optimal control problems in the presence of feedback measurements. DP was proposed in the 50s by Richard Bellman, primarily as a way to solve a general stochastic multi-stage control problem [3]. Though it provided significant insights into the mathematical nature of such a problem, the computational obstacle referred to as the curse-of-dimensionality has prevented it from having much impact on control practice. MPC came later, in the late 70s and 80s, as a way to deal more effectively and systematically with constrained multivariable control problems that are common in process industries [1]. By employing an on-line optimization approach, it circumvented the problem of having to solve the Bellmans optimality equation, which could not be solved analytically for nonlinear systems and / or in the presence of inequality constraints. Unlike DP, it has had an enormous impact on control practice. In this paper, we compare the two important methods for optimal feedback control and explore opportunities to integrate them so that they can benefit from each other. First, their relative merits and drawbacks are analyzed. The curse-of-dimensionality of DP can be alleviated by using the concept of forward dynamic programming applied in the context of simulation-based reinforcement learning. This approach is often referred to as approximate dynamic programming. There MPC can serve as a useful suboptimal policy in starting the simulation. On the other hand, the value function obtained by DP can be an excellent choice for the terminal penalty function for MPC, especially in the context of short-horizon-based control or stochastic system control.
2. MODEL PREDICTIVE CONTROL

MPC solves an open-loop optimal control problem on-line at each sample time but repeats the calculation at every sample time with a feedback update and a moving time window [1,2]. Consider a general state space system:
xt +1 = f ( xt , ut )
In MPC, the following optimization is solved at each sample time t with the initial state x 0 set as the current measured state xt:
( 0 ),",u ( p 1) u
min
,u ) + (x ( x
k k k =0
p 1
k +1 = f ( x k , u k ), x k , u k ) 0 g(x
0 is From the computed solution, the first move u implemented as ut and the rest of the moves are discarded. The procedure repeats at the next sample time t+1 with the new state measurement xt+1. The procedure defines an implicit control law ut=U(xt), where U represents the solution operator for the minimization.
One significant advantage of MPC is that a very difficult problem of solving for an optimal feedback control policy (usually involving a solution of the Hamilton-Jacobi-Bellman equation) is replaced by the much simpler problem of solving an open-loop optimal control problem with a given starting state. Of course, the approach adds the burden of having to solve a potentially large-scale mathematical program on-line, a major obstacle if the program is non-convex. Nevertheless the approach has enabled us to tackle important practical control problems previously considered unwieldy (e.g., constrained multivariable control problems) and the impact of MPC on control research and practice has been enormous.
978-89-93215-03-8 98560/11/$15 ICROS
1807
MPC is formulated and analyzed mostly in the context of a purely deterministic problem as in the above. Though it can still be effective for stochastic problems given the feedback based nature of the implementation, it reacts to a new uncertainty rather than proactively dealing with it. The open-loop optimization based method appears inherently limited in dealing with stochastic and other types of uncertainties occurring in the context of a feedback control problem. This is where the DP approach may have a significant advantage over the MPC approach.
mainly to the exponential explosion of the computation with respect to the state space dimension, prevents their use on problems of practical significance. Some progress has been made with the development of reinforcement learning and approximate dynamic programming methods developed by the artificial intelligence community. The main idea of these methods is to develop a value function table gradually in simulation as one experiences various states with progressing simulation. Some function approximation method (e.g., a local interpolation or neural network fitting) is used to estimate the value of uncharted next states for the purpose of solving the right side of the Bellman equation, which is used to initialize / update the value entry of a visited state in the value table. This way, important (i.e., visited therefore relevant) states are continually collected and added into the table and their value estimates are also continually improved. Though the ADP methods have been developed mostly in the context of discrete state / decision problems common in logistics and operations research, their applications to control have also been examined [4]. It provides a useful angle to stochastic optimal control problems for which MPC has thus far proven limited in effectiveness. 4. Potential Integration MPC and DP, important in their own rights, may be combined to yield more powerful techniques and results. As a start, DP provides an optimal infinite horizon value function, which has been shown to be a very good ~ p ) . In choice for the terminal penalty function ( x fact, in MPC, a much smaller p can be used without performance loss if the optimal value function can be derived (even in an approximate form) and used as the terminal penalty function. Since MPCs on-line computational load is greatly impacted by the horizon size, the DP can be used to shift some of the on-line computational burden to off-line. In ADP, given that the value function is approximated and not exactly derived, one may be able to reduce the performance loss due to approximation errors by solving the following optimization for the input directly on-line:
3. DYNAMIC PROGRAMMING
Consider the following stochastic optimal control problem.
J * (x0 ) := min E t (xt , ut = ( xt ) ) t =0 xt +1 = f ( xt , ut , t )

The objective is to find an optimal state feedback control policy ut= (xt) that minimizes the discounted infinite-horizon sum of a stage-wise cost function defined in terms of the state and input vectors. In the above 0<<1 is the discount factor and the expectation is taken over the random variable sequence t. J*(x0) is the so called optimal value function that represents the discounted infinite-horizon sum attainable under the optimal feedback policy. Note that it is a function of the starting state x0. Bellmans DP involves solving the following optimality equation for the optimal value function J*:
J * ( x ) = min u ( x, u ) + E( | x ,u ) J * ( f ( x, u, ) ) = TJ ( x)
( )
*
]}
Once the optimal value function is found, it can easily be translated into the optimal feedback policy according to
( x) = min u { ( x, u ) + E(|x ,u ) [J ( f ( x, u , ) )]}

* *
Ironically, one of the best known applications of DP is a deterministic linear quadratic optimal control problem, which can be reduced to solving a Riccati equation. It can also be used to prove the separation principle which establishes the optimality of the Linear Quadratic Gaussian (LQR + Kalamn Filter) controller for a linear system with Gaussian noise with a quadratic objective function. For the cases where the optimization in the right hand side of the Bellman equation does not yield an analytical solution, a numerical solution approach like Value Iteration (VI) or Policy Iteration (PI) must be employed. Though they are shown to converge (T is a contraction map), the curse-of-dimensionality, referring
ut = min u ( x, u ) + E( | x ,u ) J * ( f ( x, u , ) )
]}
In the above, the presence of expectation evaluation inside the minimization can increase the computational load but this can be eliminated by the use of post-decision state [5]. In addition, if an uncharted state (i.e., a state too far from the entries of the value table to warrant interpolation) is encountered during learning or on-line implementation, it can be used as a backup controller, to initiate a new entry in the value table.
1808
Synergistic integration beyond these simple observations should be possible and they should be explored further.
REFERENCES
[1]
[2]
[3]
[4]
[5]
Lee, J. H., Model Predictive Control: Review of the Three Decades of Development, International Journal of Control, Automation, and Systems, in press, 2011. Mayne, D. Q., J. B. Rawlings, C. Rao, and P. Scokaert, Constrained model predictive control: stability and optimality, Automatica, 36, pp. 789-814, 2000. Bellman, R. E., Dynamic Programming, Princeton University Press, Princeton, NJ, 1957. Lee, J. H. and J. M. Lee, Approximate Dynamic Programming based Approach to Process Control and Scheduling, Computers and Chemical Engineering, 30, pp. 1603-1618, 2006. Wong, W. C. and J. H. Lee*, Post-decision-state-based Approximate Dynamic Programming for Robust Predictive Control of Constrained Stochastic Processes, Ind. Eng. Chem. Res., 50, pp. 1389-1399, 2011.
1809

MPC Vs DP

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

MPC Vs DP

Hochgeladen von

Copyright:

Verfügbare Formate

2011 11th International Conference on Control, Automation and Systems Oct.

26-29, 2011 in KINTEX, Gyeonggi-do, Korea

Model Predictive Control and Dynamic Programming

2. MODEL PREDICTIVE CONTROL

978-89-93215-03-8 98560/11/$15 ICROS

J * (x0 ) := min E t (xt , ut = ( xt ) ) t =0 xt +1 = f ( xt , ut , t )

( x) = min u { ( x, u ) + E(|x ,u ) [J ( f ( x, u , ) )]}

Das könnte Ihnen auch gefallen