Neural Co-state Regulator

Abstract

We propose a novel unsupervised learning framework for solving nonlinear optimal control problems (OCPs) with input constraints in real-time. In this framework, a neural network (NN) learns to predict the optimal co-state trajectory that minimizes the control Hamiltonian for a given system, at any system's state, based on the Pontryagin's Minimum Principle (PMP). Specifically, the NN is trained to find the norm-optimal co-state solution that simultaneously satisfies the nonlinear system dynamics and minimizes a quadratic regula- tion cost. The control input is then extracted from the predicted optimal co-state trajectory by solving a quadratic program (QP) to satisfy input constraints and optimality conditions. We coin the term neural co-state regulator (NCR) to describe the combination of the costate NN and control input QP solver. To demonstrate the effectiveness of the NCR, we compare its feedback control performance with that of an expert nonlinear model predictive control (MPC) solver on a unicycle model. Because the NCR's training does not rely on expert nonlinear control solvers which are often suboptimal, the NCR is able to produce solutions that outperform the nonlinear MPC solver in terms of convergence error and input trajectory smoothness even for system conditions that are outside its original training domain. At the same time, the NCR offers two orders of magnitude less computational time than the nonlinear MPC.

Example

Consider the following nonlinear optimal control problem for a unicycle model in continuous time that has quadratic stage cost:

\( \underset{u}{min} \quad \text{J} = \int_0^{t_f} \left( z^T Q z + u^T R u \right) \, dt + \phi(z(t_f)) \)

s.t. \(\quad z_1 = \dot{x} = v \cos(\theta) \)

\( \quad z_2 = \dot{y} = v \sin(\theta) \)

\( z_3 = \dot{\theta} = \omega \)

\( u \in U, \quad z(0) \in \mathbb{R}^3 \)

Here we denote \( z_1 = x, z_2 = y, z_3 = \theta, u_1 = v, u_2 = w \). The control input constraints are \( -1 \leq v \leq 1, \, -4 \leq \omega \leq 4 \), \( t_f \) is the wall-clock length of time for both the MPC and the NCR prediction horizon. For the cost function, \( Q = \text{diag}(10, 10, 10), \, R = \text{diag}(1, 1) \) and \( \phi(z(t_f)) = z^T S z(t_f) \), where \( S = 50Q \) \(= \text{diag}(500, 500, 500) \).

Case A: Seen initial conditions and zero reference

MPC state trajectories vs time (left) and control input trajectories vs time (right)

NCR state trajectories vs time (left) and control input trajectories vs time (right)

NCR state trajectories vs time (left) and predicted co-state trajectories vs time (right)

Animated wheeled robot motion (Case A)

Resulting simulation in x-y plane from MPC (reflect actual computational speed)

Resulting simulation in x-y plane from NCR (reflect actual computational speed)

Case B: Unseen initial conditions and zero reference

MPC state trajectories vs time (left) and control input trajectories vs time (right)

NCR state trajectories vs time (left) and control input trajectories vs time (right)

NCR state trajectories vs time (left) and predicted co-state trajectories vs time (right)

Animated wheeled robot motion (Case B)

Resulting simulation in x-y plane from MPC (reflect actual computational speed)

Resulting simulation in x-y plane from NCR (reflect actual computational speed)

Case C: Unseen initial conditions and nonzero reference

MPC state trajectories vs time (left) and control input trajectories vs time (right)

NCR state trajectories vs time (left) and control input trajectories vs time (right)

NCR state trajectories vs time (left) and predicted co-state trajectories vs time (right)

Animated wheeled robot motion (Case C)

Resulting simulation in x-y plane from MPC (reflect actual computational speed)

Resulting simulation in x-y plane from NCR (reflect actual computational speed)

Neural Co-state Regulator: A Data-Driven Paradigm for Real-time Optimal Control with Input Constraints

Abstract

Example

Case A: Seen initial conditions and zero reference

Animated wheeled robot motion (Case A)

Case B: Unseen initial conditions and zero reference

Animated wheeled robot motion (Case B)

Case C: Unseen initial conditions and nonzero reference

Animated wheeled robot motion (Case C)