Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification

Skalse, Joar, Abate, Alessandro

arXiv.org Artificial Intelligence 

Inverse reinforcement learning (IRL) aims to infer an agent's preferences (represented as a reward function R) from their behaviour (represented as a policy π). To do this, we need a behavioural model of how π relates to R. In the current literature, the most common behavioural models are optimality, Boltzmann-rationality, and causal entropy maximisation. However, the true relationship between a human's preferences and their behaviour is much more complex than any of these behavioural models. This means that the behavioural models are misspecified, which raises the concern that they may lead to systematic errors if applied to real data. In this paper, we analyse how sensitive the IRL problem is to misspecification of the behavioural model. Specifically, we provide necessary and sufficient conditions that completely characterise how the observed data may differ from the assumed behavioural model without incurring an error above a given threshold. In addition to this, we also characterise the conditions under which a behavioural model is robust to small perturbations of the observed policy, and we analyse how robust many behavioural models are to misspecification of their parameter values (such as e.g. the discount rate). Our analysis suggests that the IRL problem is highly sensitive to misspecification, in the sense that very mild misspecification can lead to very large errors in the inferred reward function. Inverse reinforcement learning (IRL) is a subfield of machine learning that aims to develop techniques for inferring an agent's preferences based on their actions in a sequential decision-making problem (Ng & Russell, 2000). There are many motivations for IRL. One motivation is to use it as a tool for imitation learning, where the objective is to replicate the behaviour of an expert in some task (e.g. In this context, it is not essential that the inferred preferences reflect the actual intentions of the expert, as long as they improve the imitation learning process. Another motivation for IRL is to use it as a tool for preference elicitation, where the objective is to understand an agent's goals or desires (e.g. In this context, it is of central importance that the inferred preferences reflect the actual preferences of the observed agent.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found