Reinforcement Learning Your Way: Agent Characterization through Policy Regularization

Maree, Charl, Omlin, Christian

arXiv.org Artificial Intelligence 

Recent advances in reinforcement learning (RL) have increased complexity which, especially for deep RL, has brought forth challenges related to explainability [1]. The opacity of state-of-the-art RL algorithms has led to model developers having a limited understanding of their agents' policies and no influence over learned strategies [2]. While concerns surrounding explainability have been noted for AI in general, it is only more recently that attempts have been made to explain RL systems [3, 1, 4, 5]. These attempts have resulted in a wide suite of methods requiring various degrees of expert knowledge, either about the state-action domain or about the specific RL algorithm. Further, they typically rely on post-hoc analysis of learned policies which give only observational assurances of agents' behaviour. We instead propose an intrinsic method of regularizing agents' actions based on a given prior. While current methods for RL regularization aim to improve training performance - e.g., by maximizing the entropy of the action distribution [6], or by minimising the distance to a prior sub-optimal state-action distribution [7] - our aim is the characterization of our agents' behaviours. We also extend the current regularization techniques to accommodate multi-agent systems which allows intrinsic characterization of individual agents. We provide a formal argument for the rationale of our method and demonstrate its efficacy in a toy problem where agents learn to navigate to a destination on a grid by performing, e.g., only right turns (under the premise that right turns are