Indirect Optimization Techniques

19 minute read


Numerically Solving Control Problems

Optimal control theory gives a framework for finding control policies that optimize some sort of numerical objective function. This framework consists of rules that define differential equations and boundary conditions that the optimal trajectory must obey. Finding these rules can be challenging for complicated systems, but that is only half the battle – actually finding solutions to these differential equations can be even more challenging. Analytical solutions are usually possible only with basic examples. For real-world problems, numerical solvers must be used to find these control policies.

This blog post details my Matlab implementation of several numerical optimization techniques. Most of these techniques involve the same steps:

  1. Make an initial guess of the optimal controls/trajectory
  2. Find out how “wrong” our guess is
  3. Make corrections to the optimal control/trajectory, making a new guess
  4. Repeat steps 2 and 3 until our guess converges to correct solution

There are several strategies when approaching optimal control problems. In this blog post, I will talk about indirect optimization techniques. This name is meant to distinguish between direct optimization techniques, but what is the difference? Although this will probably be a future blog post itself, the quick answer is what choices the engineer makes in terms of decision variables and necessary conditions for optimality. Indirect optimization chooses a more analytical technique, where the optimality conditions are found using calculus of variations. Direct optimization can be thought of a more brute-force approach, where the optimal trajectory is broken up into discrete steps, and the objective function is “directly” optimized using these discrete controls. Indirect methods are a more historical approach to control problems, since if you don’t have a computer, you might still be able to discover some fundamental properties of the solution. Thus, it is still a good exercise to understand these methods.

Minimizing Cost Over Trajectories

The foundation for indirect methods is the calculus a variations, where the problem is often to solve what function optimizes a functional (or a function of a function). For example, a common functional is the integral of a function, $J(f) = \int f(t) dt$. Calculus of variations finds the function of $t$ that minimizes this integral $J$, which is a functional of $f$. For example, if $J(x) = \int (x(t))^2 dt$, then the function that minimizes this integral is $x(t)=0$, since any non-zero value for the function would result in a positive value for the integral.

The calculus of variations approach to solving this problem analytically is surprisingly close to regular calculus: if trajectory $x(t)$ results in a functional value of $J(x)$, then perturbing this trajectory by a little bit, $x(t)+\delta x(t)$, generally results in a perturbed functional value, $J(x) + \delta J(x)$. The goal is to find a function $x(t)$ such that any small perturbation $\delta x(t)$ results $\delta J(x) = 0$. This is similar to optimization using calculus, where the goal is to find the value where the slope is equal to zero.

Suppose a system is governed by $\dot{x}=a(x(t),u(t))$, where $x(t)$ is the state of the system overtime and $u(t)$ is the control variable, i.e., what we are trying to solve for. Assuming initial condition $x(0)=x_0$ and time interval $t\in [0,T]$, a common control cost function to minimize usually has the form of $J(x) = h(x) + \int_0^Tg(x,u)dt$. It is often helpful to define an auxillary function, $H$, called the Hamiltonian of the system:

\begin{equation} H = g(x,u) + p(t)a(x,u) \end{equation}

The function $p(t)$ is called a costate, and is directly connected to the state $x(t)$. The reason for it being here in our problem is connected to another common optimization procedure, Lagrange multipliers. Lagrange multipliers are needed when optimization over must be constrained by a given function (or set of functions). In our case, the constraining function is $\dot{x}=a(x(t),u(t))$, which constrains the velocity of $x(t)$ to the current value of $x(t)$ and $u(t)$.

Calculus of variations gives the following conditions as necessary for optimal control inputs $u(t)$, assuming $x(T)$ is free to be any value:

\begin{equation} \label{eq:xEOM} \dot{x} = \frac{\partial H}{\partial p} = a(x,u) \end{equation}

\begin{equation} \label{eq:pEOM} \dot{p} = -\frac{\partial H}{\partial x} = -\frac{\partial a}{\partial x}p - \frac{\partial g}{\partial x} \end{equation}

\begin{equation} \label{eq:minU} 0 = \frac{\partial H}{\partial u} \end{equation}

\begin{equation} \label{eq:x0} x(0) = x_0 \end{equation}

\begin{equation} \label{eq:final} p(T) = \frac{\partial h}{\partial x}(x(T)) \end{equation}

Notice how Eq. \ref{eq:xEOM}-\ref{eq:minU} are all found by taking the partial derivative of the Hamiltonian with respect to the different variables involve, $x$, $p$ and $u$. If the controls are non-differentiable, e.g., if admissible controls are constrained by upper and lower values, then Eq. \ref{eq:minU} becomes $H(x,p,u^*)\le H(x,p,u)$, also known as Pontryagin’s minimum principle. Eq. \ref{eq:x0} is an initial condition, and Eq. \ref{eq:final} is a boundary condition at the end of the trajectory (when $t=T$).

Together, these equations provide necessary (but not sufficient) conditions for an optimal trajectory. The problem is to find solutions for $x(t)$, $p(t)$ and $u(t)$ that satisfies all the above conditions simultaneously. For very simple problems, this is tractable, but for more complex problems a numerical approach is needed to find these solutions.

Numerically solving ODE’s

The above necessary conditions means we need to solve a system of ODE’s. In general, this is made tricky for two reasons:

  1. In general, the equations of motion $\dot{x}=\frac{\delta H}{\delta p}$ and $\dot{p}=-\frac{\delta H}{\delta x}$ are nonlinear functions.
  2. Eq. \ref{eq:x0} and \ref{eq:final} present split boundary conditions, meaning we have partial information about the trajectory at $t=0$ and at $t=T$, but not complete information at either boundary.

If the equations of motion were linear, or if the boundary conditions weren’t split, then finding the solutions to this problem would actually be quite easy. Unfortunately, this isn’t the case in general, so more complicated schemes must be developed to find the solution. These usually involve finding a trajectory where only some of the ODE’s or boundary conditions are satisfied, then iteratively changing the trajectory to find a solution where all conditions are satisfied.

Approach 1: Gradient descent

In this approach, we initially find trajectories that satisfy Eq. \ref{eq:xEOM}, \ref{eq:pEOM}, \ref{eq:x0}, and \ref{eq:final}. This leaves \ref{eq:minU}, or $\frac{\delta H}{\delta u}=0$, to be satisfied. It can be shown that if the control $u(t)$ is varied by $\delta u$ with the other equations satisfied, then variation in the cost $\delta J$ is:

\begin{equation} \delta J = \int_{t_0}^{T} \left(\frac{\partial H}{\partial u} \delta u \right) dt \end{equation}

If we choose $\delta u = -\tau \frac{\partial H}{\partial u}$, where $\tau$ is some positive constant, then the variation becomes:

\begin{equation} \delta J = - \tau \int_{t_0}^{T} \left(\frac{\partial H}{\partial u}\right)^2 dt \end{equation}

In other words, this particular choice in $\delta u$ guarantees that the cost decreases over each iteration, provided that $\tau$ is small enough. This is similar to many machine learning algorithms, where some positive definite cost (e.g., the average sum of square errors) is reduced by gradient descent. In this analogy, the constant $\tau$ would be considered the learning rate.

The workflow for this algorithm is:

  1. With initial guess $x_0$, $p_0$ and $u_0$, first integrate the state $x(t)$ forward from $x_0$ to $x(T)$.
  2. Using the boundary condition of Eq. \ref{eq:final}, determine $p(T)$.
  3. Integrate backwards in time from $p(T)$ to $p_0$ (overwriting the initial guess of $p_0$) to find the trajectory $p(t)$. Now we have trajectories for $x(t)$ and $p(t)$ that satisfy \ref{eq:xEOM}, \ref{eq:pEOM}, \ref{eq:x0}, and \ref{eq:final}, but not Eq. \ref{eq:minU}.
  4. Evaluate $\frac{\partial H}{\partial u}$ at each discrete point in time, indexed by $k$. Update the applied control at this point by $u(k) := u(k) - \tau\frac{\partial H}{\partial u}$.
  5. If $\int_{t_0}^{T} \left(\frac{\partial H}{\partial u}\right)^2$ is below some positive threshold, then terminate the algorithm. If not, repeat steps 2, 3 and 4.

This algorithm was tested on a simple double-integrator system with $\ddot{x}=u$. The system started at $x_0=0.0$ and was tasked to track a constant reference value $x_r$. In order to track the reference, the cost function was defined with $g(x,u) = Q_0(x-x_r)^2 + Q_1(\dot{x})^2 + Ru^2$ and $h(x)=Q_h(x-x_r)^2$. The constants $Q_0$, $Q_1$, $Q_h$, and $R$ were tuned to get reasonable results.

Below shows how gradient descent iterations converges onto the optimal solution:

Approach 2: Variation of Extremals

This approach first assumes that Eq. \ref{eq:minU} can be rearranged to solve for $u^*$ in terms of $x$ and $p$. This means that the state dynamics of $x$ and $p$ can be solved entirely in terms of both variables:

\begin{equation} \label{eq:xEOM_implicit} \dot{x} = \frac{\partial H}{\partial p} = a(x,p) \end{equation}

\begin{equation} \label{eq:pEOM_implicit} \dot{p} = -\frac{\partial H}{\partial x} = d(x,p) \end{equation}

Thus, the problem now is to find $x(t)$ and $p(t)$ that satisfy these two differential equations, as well as the two end conditions Eq. \ref{eq:x0} and Eq. \ref{eq:final}. The variation of extremals method then seeks to make a guess at $p_0$ such that the boundary condition Eq. \ref{eq:final} is satisfied. At first, this initial guess will almost certainly be incorrect. Suppose that after integrating Eq. \ref{eq:xEOM_implicit} and Eq. \ref{eq:pEOM_implicit} to find $x(t)$ and $p(t)$ with $x(0)=x_0$ and $p(0)=p_0$, we find that $p(T)\neq \frac{\partial h}{\partial x}_{t=T}$ (i.e., Eq. \ref{eq:final} is not satisfied). The variation of extremals asks: how should we changed the value of $p_0$ such that $p(T) = \frac{\partial h}{\partial x}_{t=T}$? There is a complicated, nonlinear relationship between $p_0$ and $p(T)$, so finding an answer to this question is almost impossible. But we can ask a slightly more general question: if we increase/decrease $p_0$ by a small amount, will the difference $\text(abs)\left(p(T) - \frac{\partial h}{\partial x}_{t=T}\right)$ become bigger or smaller?

In a certain sense, there is a functional relationship between $p(T)$ and $p_0$, where each value of $p_0$ must map to a single value of $p(T)$. Suppose this function is denoted by $F$ so that $p(T) = F(p_0)$. We want an approach where we can iterations of guess for $p_0$ can coverge to a value such that $F(p_0) = \frac{\partial h}{\partial x}_{t=T}$, or to put another way, $F(p_0) - \frac{\partial h}{\partial x}_{t=T} = 0$.

There is such an approach, called Newton’s Method, that does exactly this. Given an initial guess $p_0^0$ of a function’s root, the function evaluated at this point, $F(p_0^0)$, and the derivative evaluated at the same point, $F’(p_0^0)$, then Newton’s method gives you a better approximation of the function’s root, $p_0^1$. The relationship is straightforward:

\begin{equation} \label{eq:NewtonMethod} p_0^1 = p_0^0 - \frac{F(p_0^0)}{F’(p_0^0)} \end{equation}

Eq. \ref{eq:NewtonMethod} can be repeated $k$ times to find $p_0^k$. The larger $k$ is, the better the approximation.

Finding both $F(p_0)$ and $F’(p_0)$ are fairly simple in theory: use $p_0$ as the initial costate value and integrate the equations of motions forward in time to find $p(T)$ and calculate $p(T) - \frac{\partial h}{\partial x}_{t=T}$. With some overloading of notation, let’s define this as $F(p_0)$. Finding $F’(p_0)$ would then entail using $p_0 \pm \epsilon$ as the initial costate values and integrating forward to get $F(p_0\pm\epsilon)$. This two values can be used to estimate the derivative: $F’(p_0)\approx \left( F(p_0+\epsilon)-F(p_0-\epsilon) \right)/(2\epsilon)$.

It turns out the above procedure is okay if $p_0$ has small dimensionality, but very inefficient if $p_0$ has larger dimensionality. Think about it: if the costate has a dimension of 5, then we must figure out how each of the 5 initial values in $p_0$ effect the 5 final values in $p(T)$, for a total of 25 derivatives that must be calculated each iteration. The naive approach for approximating $F’(p_0)$ will be very slow. Luckily, there is a more efficient of computing these derivatives all at once, instead of one-by-one. I will not detail the approach here, but it can be found in Chapter 6.3 of Kirk’s Optimal Control Theory (a free pdf of the book is also one google search away).

Here is the workflow for this numerical optimal control algorithm:

  1. Use Eq. \ref{eq:minU} to find optimal control $u(x,p)$, and plug into equations of motion to find Eq. \ref{eq:xEOM_implicit} and Eq. \ref{eq:pEOM_implicit}.
  2. Initialize $x_0$ and $p_0$.
  3. Integrate forward in time to find final value of the costate $p(T)$. Also find how the value of $p(T)$ changes due to small changes in the initial value of the costate, $p_0$. This can be used to find the derivate of $p(T)$ with respect to $p_0$, needed for Netwon’s method.
  4. Use Eq. \ref{eq:NewtonMethod} to update $p_0$ so that $p(T) - \frac{\partial h}{\partial x}_{t=T} \rightarrow 0$ after each iteration.
  5. If $p(T) - \frac{\partial h}{\partial x}_{t=T}$ is sufficiently close to zero, terminate the algorithm. Otherwise, return to step 3.

Below, I apply the variation of extremals method to the same double-integrator as before, with the same initialization for both state and costate trajectory.

About the Two Approaches

Some small discussion is warranted about the pros and cons of the two approaches detailed above.

Regarding the method of gradient descent, this method is guaranteed to work no matter how the trajectories are initialized. This makes it quite robust to bad initial guesses, which is helpful when the control system is complex enough such that a good initial guess is impossible without a lot of work before hand. All you need to do is make sure the learning rate $\tau$ is small enough so the gradient steps don’t “jump” over the minimum. The downside, however, is that convergence can be very slow to approach the absolute minimum. The gradient is always steepest when you first start the optimization process, but after each iteration, the gradient reduces in magnitude. Another way to label it might be diminishing returns: the same work to bring the cost down from 100 to 10 might be the same amount of work to bring the cost down from 10 to 1. This kind of problem is also present in machine learning, where the regression variables or neural networks weights are changed using gradient descent. Various methods, such as Adam optimization, are used to combat the slow nature of gradient descent as it converges to the optimum.

On the other hand, the variation of extremals can converge quite quickly to the optimum, as it is capable of taking quite large steps without missing the optimum, unlike gradient descent. The only issue is that convergence is highly dependent on the initial guess for $p_0$. If this guess is off by some amount, then the variation of extremals with likely diverge. What makes it even more tricky is that it is difficult to tell how “close” you need to be in order to get fast convergence. Indeed, in the above gif where I’m not remotely close to the optimal trajectory initially, convergence is achieved because Newton’s method (and hence the variation of extremals) has the nice property that it always converges for linear systems, e.g., the double integrator.

When solving problem numerically, you can also mix-and-match approaches in an attempt to leverage the benefits of both. For example, you could use gradient descent initially to find a good approximate solution, then use variation of extremals to converge quickly on the optimal trajectory.

Practical Example: Cart-pole

I want to end this blog post with an application of numerical techniques to solve for a well known system: the cart-pole. This system is well known in control because it is relatively simple to understand qualitatively, but exhibits highly nonlinear behavior in the equations of motion. The goal for this application is to use numerical techniques to solve for a control that will push the cart-pole from its stable equilibrium (the pole pointing down) to its unstable equilibrium (the pole pointing up).

For this approach, I use gradient descent to initially find the optimal controls. Then, I use variation of extremals to help achieve convergence faster. The state equations of motion I actually derived in another blog post, if you want to check that out.

Below is a gif of the cart-pole in action:

As you can see, the cart-pole doesn’t quite settle onto the unstable equilibrium, but it does a pretty good job of getting the pole up there. Since this is my first time coding something like this, I consider it more of a success than a failure.

Some things I discovered while applying these numerical techniques to solving controls for the cart-pole system:

  • Tuning the parameters proved to be quite tricky. For example, my cost function had the square error between the current angle of the cart-pole and the desired angle (which was straight up, i.e, $\theta=\pi/2$). If I found that the trajectory had trouble getting exactly to this reference value for the angle, one sensible thing to do is increase the weight inside the cost function associated with the angular reference. When I did this, though, I sometimes found that suddenly gradient descent wouldn’t converge! I eventually discovered that when I increased the weight inside the cost function, I also increase the magnitude of the gradient of that cost function. To compensate, then, I decreased the learning rate $\tau$, but this would lead to slower convergence.
  • It seemed like the OCP solver would take the amount of time I gave it in order to fully swing up. For instance, I originally had the OCP solver find trajectories over 3 seconds. I found that the pole almost made it to $\pi/2$, but not quite. So, one thing I tried was to increase the time of the trajectories to 3.5 seconds in length; perhaps the cart-pole simply needed for time. Increasing the time, however, never seemed to really help. No matter the length of time I set the trajectories to last, the pole would only ever make it close to $\pi/2$ at the very end of the trajectory, whether that be 3 seconds, 4 seconds, or 5 seconds. You would like that it would try to reach $\pi/2$ at the earliest time in order to minimize cost. Perhaps gradient descent converged onto local minima, rather than global minima. Or perhaps by increasing the time, the optimal solution was to have a smaller control effort over a larger time horizon.


While it is a little disappointing that the cart-pole couldn’t stabilize itself upright, I’m still pretty happy with the overall results. One thing that happens in real life demos of the cart-pole is that while optimal control is used to “swing up” the pole, a PID controller is used when the system is close to its unstable equilibrium. Since the PID is designed to operate around this unstable equilibrium, this two-controller approach works pretty well overall. That is an obvious next for this project, if I wish to return to it in the future.

Overall, I had a lot of fun learning about these indirect optimization techniques, and especially seeing them actually work on the cart-pole system.