A Bayesian Approach to Robust Inverse Reinforcement Learning
Wei, Ran, Zeng, Siliang, Li, Chenliang, Garcia, Alfredo, McDonald, Anthony, Hong, Mingyi
–arXiv.org Artificial Intelligence
Inverse reinforcement learning (IRL) is the problem of extracting the reward function and policy of a value-maximizing agent from its behavior [1, 2]. IRL is an important tool in domains where manually specifying reward functions or policies is difficult, such as in autonomous driving [3], or when the extracted reward function can reveal novel insights about a target population and be used to device interventions, such as in biology, economics, and human-robot interaction studies [4, 5, 6]. However, wider applications of IRL face two interrelated algorithmic challenges: 1) having access to the target deployment environment or an accurate simulator thereof and 2) robustness of the learned policy and reward function due to the covariate shift between the training and deployment environments or datasets [7, 8, 9]. In this paper, we focus on model-based offline IRL to address challenge 1). A notable class of model-based offline IRL methods estimate the dynamics and reward in a two-stage fashion (see Figure 1) [10, 11, 12, 13]. In the first stage, a Figure 1: Objectives of the traditional two-stage dynamics model is estimated from the offline IRL and the proposed simultaneous estimation approach of Bayesian model-based IRL.
arXiv.org Artificial Intelligence
Sep-15-2023
- Country:
- North America > United States > Texas (0.14)
- Genre:
- Research Report (0.50)
- Industry:
- Automobiles & Trucks (0.34)