EN
FR |
Daniel Alazard > Goodies & demos > Pontryagin’s minimum principle (case : Linear sytem, Quadratic performance (...)

Pontryagin’s minimum principle (case : Linear sytem, Quadratic performance index, finite time horizon)

mercredi 16 novembre 2016.

Optimal control :

Linear system, quadratic performance index, fixed horizon and final state

Contents

Problem :

Let us consider the linear system :

\dot<i>\mathbf<i>x</strong>(t)=\mathbf<i>A</i>\mathbf<i>x</i>(t)+\mathbf<i>B</i> \mathbf<i>u</i>(t) \quad \mathbf<i>x</i>\in\mathbf<i>\mbox<i>I\hspace<i>-.15em</i>R</strong>^n&nbsp;;~; \mathbf<i>u</i>\in\mathbf<i>\mbox<i>I\hspace<i>-.15em</i>R</strong>^m~; (1)

From a given initial state \mathbf<i>x</i>_0=\mathbf<i>x</i>(0), the objective is to bring back the state to 0 within a given time horizon t_f (\mathbf<i>x</i>(t_f)=0) while minimizing the quadratic performance index :

 J=\frac<i>1</i><i>2</i>\int_0^<i>t_f</i> (\mathbf<i>x</i>^T(t)\mathbf<i>Q</i>\mathbf<i>x</i>(t)+ \mathbf<i>u</i>^T(t)\mathbf<i>R</i>\mathbf<i>u</i>(t))dt

where \mathbf<i>Q</i> and \mathbf<i>R</i> are given weighting matrices with \mathbf<i>Q</i>\ge 0 and \mathbf<i>R</i>>0.

Solution using Pontryagin’s minimum principle :

  • The Hamiltonian reads :

 \mathcal<i>H</i>=\frac<i>1</i><i>2</i>(\mathbf<i>x</i>^T\mathbf<i>Q</i>\mathbf<i>x</i>+\mathbf<i>u</i>^T\mathbf<i>R</i>\mathbf<i>u</i>)+\mathbf<i>\Psi</i>^T(\mathbf<i>A</i>\mathbf<i>x</i>+\mathbf<i>B</i>\mathbf<i>u</i>)

where \mathbf<i>\Psi</i>\in\mathbf<i>\mbox<i>I\hspace<i>-.15em</i>R</strong>^n is the costate vector.

  • the optimal control minimizes \mathcal<i>H</i>~; \forall t :

 \frac<i>\partial \mathcal<i>H</strong><i>\partial  \mathbf<i>u</strong>_<i>|\mathbf<i>u</i>=\widehat<i>\mathbf<i>u}}}=0=\mathbf<i>R</i>\widehat<i>\mathbf<i>u</strong>+\mathbf<i>B</i>^T\mathbf<i>\Psi</i>~;\Rightarrow~;\widehat<i>\mathbf<i>u</strong>=-\mathbf<i>R</i>^<i>-1</i>\mathbf<i>B</i>^T\mathbf<i>\Psi</i>~;(2)

  • Costate dynamics :

\dot<i>\mathbf<i>\Psi</strong>=-\frac<i>\partial \mathcal<i>H</strong><i>\partial \mathbf<i>x</strong>~; \Rightarrow~; \dot<i>\mathbf<i>\Psi</strong>=-\mathbf<i>Q</i>\mathbf<i>x</i>- \mathbf<i>A</i>^T\mathbf<i>\Psi</i>~; (3)

  • State-costate dynamics : (1), (2) and (3) leads to :

 \left\<i>\begin<i>array</i><i>ccccc</i>\dot<i>\mathbf<i>x</strong> & = & \mathbf<i>A</i>\mathbf<i>x</i> &-& \mathbf<i>B</i>\mathbf<i>R</i>^<i>-1</i>\mathbf<i>B</i>^T\mathbf<i>\Psi</i>\\ \dot<i>\mathbf<i>\Psi</strong> & = & -\mathbf<i>Q</i> \mathbf<i>x</i>&-& \mathbf<i>A</i>^T\mathbf<i>\Psi</i>\end<i>array</i>\right.\Rightarrow \left[\begin<i>array</i><i>c</i>\dot<i>\mathbf<i>x</strong> \\ \dot<i>\mathbf<i>\Psi</strong>\end<i>array</i>\right]= \mathbf<i>H</i>\left[\begin<i>array</i><i>c</i>\mathbf<i>x</i> \\ \mathbf<i>\Psi</i>\end<i>array</i>\right] (4)

with

\mathbf<i>H</i>=\left[\begin<i>array</i><i>cc</i> \mathbf<i>A</i> & -\mathbf<i>B</i>\mathbf<i>R</i>^<i>-1</i>\mathbf<i>B</i>^T\\ -\mathbf<i>Q</i> & -\mathbf<i>A</i>^T\end<i>array</i>\right]~;.

\mathbf<i>H</i> is the 2n\times 2n Hamiltonian matrix associated to such a control problem. (4) can be intregrated taken into account boundary conditions on the state-costate augmented vector [\mathbf<i>x</i>^T~;~;\mathbf<i>\Psi</i>^T]^T :

  • initial conditions on \mathbf<i>x</i> : \mathbf<i>x</i>(0)=\mathbf<i>x</i>_0 (5),
  • terminal conditions on \mathbf<i>x</i> : \mathbf<i>x</i>(t_f)=0 (6).

The set of equations (4), (5) and (6) is also called a two point boundary-value problem.

  • Integration of the two point boundary-value problem :

 \left[\begin<i>array</i><i>c</i>\mathbf<i>x</i>(t_f)=0 \\ \mathbf<i>\Psi</i>(t_f)\end<i>array</i>\right]=e^<i>\mathbf<i>H</i>t_f</i>\left[\begin<i>array</i><i>c</i>\mathbf<i>x</i>(0)=\mathbf<i>x</i>_0 \\ \mathbf<i>\Psi</i>(0)\end<i>array</i>\right]=\left[\begin<i>array</i><i>cc</i> e^<i>\mathbf<i>H</i>t_f</i>_<i>11</i> & e^<i>\mathbf<i>H</i>t_f</i>_<i>12</i> \\ e^<i>\mathbf<i>H</i>t_f</i>_<i>21</i>  & e^<i>\mathbf<i>H</i>t_f</i>_<i>22</i>\end<i>array</i>\right]\left[\begin<i>array</i><i>c</i>\mathbf<i>x</i>(0)=\mathbf<i>x</i>_0 \\ \mathbf<i>\Psi</i>(0)\end<i>array</i>\right]

where e^<i>\mathbf<i>H</i>t_f</i>_<i>ij</i>, i,j=1,2 are the 4 n\times n submatrices partionning e^<i>\mathbf<i>H</i>t_f</i> (WARNING !! : e^<i>\mathbf<i>H</i>t_f</i>_<i>ij</i>\neq e^<i>\mathbf<i>H</i>_<i>ij</i>t_f</i>).

Then one can easily derive the initial value of the costate :

 \mathbf<i>\Psi</i>(0)=-\left[e^<i>\mathbf<i>H</i>t_f</i>_<i>12</i>\right]^<i>-1</i>\,e^<i>\mathbf<i>H</i>t_f</i>_<i>11</i>\,\mathbf<i>x</i>_0=\mathbf<i>P</i>(0)\,\mathbf<i>x</i>_0~;.

where \mathbf<i>P</i>(0)=-\left[e^<i>\mathbf<i>H</i>t_f</i>_<i>12</i>\right]^<i>-1</i>\,e^<i>\mathbf<i>H</i>t_f</i>_<i>11</i> depends only on the problem data : \mathbf<i>A</i>, \mathbf<i>B</i>, \mathbf<i>Q</i>, \mathbf<i>R</i>, t_f and not on \mathbf<i>x</i>_0.

  • Optimal control initial value : from equation (2) :

\widehat<i>\mathbf<i>u</strong>(0)=-\mathbf<i>R</i>^<i>-1</i>\mathbf<i>B</i>^T\mathbf<i>P</i>(0)\,\mathbf<i>x</i>_0~;.

  • Closed-loop optimal control at any time t : at time t\in [0,~; t_f[, assuming that the current state \mathbf<i>x</i>(t) is known (using a measurement system), the objective is still to bring back the final state to 0 (\mathbf<i>x</i>(t_f)=0) but the time horizon is now t_f-t. The calculus of the current optimal control \widehat<i>\mathbf<i>u</strong>(t) is the same problem than the previous one, just changing \mathbf<i>x</i>_0 by \mathbf<i>x</i>(t) and t_f by t_f-t. Thus :

\mathbf<i>\Psi</i>(t)=\mathbf<i>P</i>(t)\,\mathbf<i>x</i>(t)~; \mbox<i>with&nbsp;:</i>~;\mathbf<i>P</i>(t)=-\left[e^<i>\mathbf<i>H</i>(t_f-t)</i>_<i>12</i>\right]^<i>-1</i>\,e^<i>\mathbf<i>H</i>(t_f-t)</i>_<i>11</i>~;,

\widehat<i>\mathbf<i>u</strong>(t)=-\mathbf<i>R</i>^<i>-1</i>\mathbf<i>B</i>^T\mathbf<i>P</i>(t)\,\mathbf<i>x</i>(t)=-\mathbf<i>K</i>(t)\,\mathbf<i>x</i>(t)~;.

with :

\mathbf<i>K</i>(t)=\mathbf<i>R</i>^<i>-1</i>\mathbf<i>B</i>^T\mathbf<i>P</i>(t)

the time-varying state feedback to be implemented in closed-loop according to the following Figure :

Remark : \mathbf<i>P</i>(t_f) is not defined since e^<i>\mathbf<i>H</i>0</i>_<i>12</i>=\mathbf<i>0</i>_<i>n\times n</i> and is not invertible.

  • Optimal state trajectories : The integration of equation (4) between 0 and t (\forall~;t\in [0,~; t_f[) leads to (first n row) :

 \mathbf<i>x</i>(t)=e^<i>\mathbf<i>H</i>t</i>_<i>11</i>\,\mathbf<i>x</i>_0+e^<i>\mathbf<i>H</i>t</i>_<i>12</i>\,\mathbf<i>\Psi</i>(0= \left(e^<i>\mathbf<i>H</i>t</i>_<i>11</i> - e^<i>\mathbf<i>H</i>t</i>_<i>12</i>\left[e^<i>\mathbf<i>H</i>t_f</i>_<i>12</i>\right]^<i>-1</i>\,e^<i>\mathbf<i>H</i>t_f</i>_<i>11</i>\right)\,\mathbf<i>x</i>_0= \mathbf<i>\Phi</i>(t_f,t)\,\mathbf<i>x</i>_0~;.

where :

\mathbf<i>\Phi</i>(t_f,t)=e^<i>\mathbf<i>H</i>t</i>_<i>11</i> - e^<i>\mathbf<i>H</i>t</i>_<i>12</i>\left[e^<i>\mathbf<i>H</i>t_f</i>_<i>12</i>\right]^<i>-1</i>\,e^<i>\mathbf<i>H</i>t_f</i>_<i>11</i>

is called the transition matrix.

  • Optimal performance index :

For any t\in [0,~; t_f[ and a current state \mathbf<i>x</i> one can define the cost-to-go function (or value-function) \mathcal<i>R</i>(\mathbf<i>x</i>,t) as :

\mathcal<i>R</i>(\mathbf<i>x</i>,t)=\frac<i>1</i><i>2</i>\int_t^<i>t_f</i> (\mathbf<i>x</i>^T\mathbf<i>Q</i>\mathbf<i>x</i>+ \mathbf<i>u</i>^T\mathbf<i>R</i>\mathbf<i>u</i>)d\tau

and the optimal cost-to-go function as :

\widehat<i>\mathcal<i>R</strong>(\mathbf<i>x</i>,t)=\frac<i>1</i><i>2</i>\int_t^<i>t_f</i> (\mathbf<i>x</i>^T\mathbf<i>Q</i>\mathbf<i>x</i>+ \widehat<i>\mathbf<i>u</strong>^T\mathbf<i>R</i>\widehat<i>\mathbf<i>u</strong>)d\tau

\widehat<i>\mathcal<i>R</strong>(\mathbf<i>x</i>,t)=\frac<i>1</i><i>2</i>\int_t^<i>t_f</i> (\mathbf<i>x</i>^T\mathbf<i>Q</i>\mathbf<i>x</i>+ \mathbf<i>\Psi</i>^T\mathbf<i>B</i>\mathbf<i>R</i>^<i>-1</i>\mathbf<i>B</i>^T\mathbf<i>\Psi</i>)d\tau~;.

From equation (4) : one can derive that :

 \mathbf<i>B</i>\mathbf<i>R</i>^<i>-1</i>\mathbf<i>B</i>^T\mathbf<i>\Psi</i>=\mathbf<i>A</i>\mathbf<i>x</i>-\dot<i>\mathbf<i>x</strong>~;,

\mathbf<i>Q</i>\mathbf<i>x</i>=- \mathbf<i>A</i>^T\mathbf<i>\Psi</i>-\dot<i>\mathbf<i>\Psi</strong>

Thus (after simplification) :

\widehat<i>\mathcal<i>R</strong>(\mathbf<i>x</i>,t)=\frac<i>1</i><i>2</i>\int_t^<i>t_f</i> (-\mathbf<i>x</i>^T\dot<i>\mathbf<i>\Psi</strong>-\mathbf<i>\Psi</i>^T\dot<i>\mathbf<i>x</strong>) d\tau=-\frac<i>1</i><i>2</i>\int_t^<i>t_f</i>\frac<i>d\,(\mathbf<i>x</i>^T\mathbf<i>\Psi</i>)</i><i>d\tau</i>d\tau=0+\frac<i>1</i><i>2</i>\mathbf<i>x</i>^T(t)\mathbf<i>\Psi</i>(t)

Thus :

\widehat<i>\mathcal<i>R</strong>(\mathbf<i>x</i>,t)=\frac<i>1</i><i>2</i>\mathbf<i>x</i>^T(t)\mathbf<i>P</i>(t)\mathbf<i>x</i>(t)

From this last equation, on can find again the definition of the costate \mathbf<i>\Psi</i> used to solve the Hamilton–Jacobi–Bellman equation ; i.e. : the gradient of the optimal cost-to-go function w.r.t. \mathbf<i>x</i> :

 \mathbf<i>\Psi</i>(t)=\frac<i>\partial \widehat<i>\mathcal<i>R</strong>(\mathbf<i>x</i>,t)</i><i>\partial \mathbf<i>x</strong>

The optimal performance index is : \widehat<i>J</i>=\widehat<i>\mathcal<i>R</strong>(\mathbf<i>x</i>_0,0)=\frac<i>1</i><i>2</i>\mathbf<i>x</i>^T_0\mathbf<i>P</i>(0)\mathbf<i>x</i>_0.

Exercises

  • Exo #1 : show that \mathbf<i>P</i>(t) is the solution of the matrix Riccati differential equation :

 \dot<i>\mathbf<i>P</strong>=-\mathbf<i>P</i>\mathbf<i>A</i>-\mathbf<i>A</i>^T\mathbf<i>P</i>+\mathbf<i>P</i>\mathbf<i>B</i>\mathbf<i>R</i>^<i>-1</i>\mathbf<i>B</i>^T\mathbf<i>P</i>-\mathbf<i>Q</i>

also written as :

 \dot<i>\mathbf<i>P</strong>=\left[-\mathbf<i>P</i>~;~;\mathbf<i>I</i>_n\right]\mathbf<i>H</i>\left[\begin<i>array</i><i>c</i>\mathbf<i>I</i>_n \\ \mathbf<i>P</i> \end<i>array</i>\right]~;.

  • Exo #2 : considering now that \mathbf<i>x</i>(t_f)=\mathbf<i>x</i>_f\neq \mathbf<i>0</i>, compute the time-variant state feedback gain \mathbf<i>K</i>(t) and the time-variant feedforward gain \mathbf<i>H</i>(t) of the optimal closed-loop control law to be implemented according to the following Figure.

SPIP | | Suivre la vie du site RSS 2.1