On the Effective Horizon of Inverse Reinforcement Learning
Xu, Yiqing, Doshi-Velez, Finale, Hsu, David
–arXiv.org Artificial Intelligence
Inverse reinforcement learning (IRL) algorithms often rely on (forward) reinforcement learning or planning over a given time horizon to compute an approximately optimal policy for a hypothesized reward function and then match this policy with expert demonstrations. The time horizon plays a critical role in determining both the accuracy of reward estimate and the computational efficiency of IRL algorithms. Interestingly, an effective time horizon shorter than the ground-truth value often produces better results faster. This work formally analyzes this phenomenon and provides an explanation: the time horizon controls the complexity of an induced policy class and mitigates overfitting with limited data. This analysis leads to a principled choice of the effective horizon for IRL. It also prompts us to reexamine the classic IRL formulation: it is more natural to learn jointly the reward and the effective horizon together rather than the reward alone with a given horizon. Our experimental results confirm the theoretical analysis.
arXiv.org Artificial Intelligence
Jul-12-2023
- Country:
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Genre:
- Research Report > New Finding (0.87)
- Technology: