Value Functions are Control Barrier Functions: Verification of Safe Policies using Control Theory

Tan, Daniel C. H., Acero, Fernando, McCarthy, Robert, Kanoulas, Dimitrios, Li, Zhibin

arXiv.org Artificial Intelligence 

Deep reinforcement learning (RL) [1] is a powerful and scalable tool for solving control problems, such as Atari games [2], robotic control [3], and protein folding [4]. However, because of their black-box nature, it is difficult to determine the behaviour of neural networks. In extreme cases, out-of-distribution or adversarially constructed inputs [5] can catastrophically degrade network performance. In the control context, this can lead to highly unsafe behaviour; it is thus risky to deploy such controllers in safety-critical applications, such as autonomous vehicles or human-robot interaction, as well as future applications for general-purpose robots. The problem of safe control has been extensively studied in safe reinforcement learning, through the lens of constrained Markov Decision Processes [6]. Such methods implicitly assume that there are known constraints which are sufficient to guarantee safety. In contrast, our work assumes no prior knowledge of safe dynamics and aims to learn a constraint (in the form of a barrier function) to guarantee safety. This enables our approach to handle applications where safety cannot be easily expressed analytically, such as avoiding dynamic obstacles from raw pixel input [7]. On the other hand, there exists rich literature in control theory on proving properties of dynamical systems using certificate functions.