Distributionally Robust Imitation Learning

Jan-19-2025, 05:30:51 GMT–Neural Information Processing Systems

We consider the imitation learning problem of learning a policy in a Markov Decision Process (MDP) setting where the reward function is not given, but demonstrations from experts are available. Although the goal of imitation learning is to learn a policy that produces behaviors nearly as good as the experts' for a desired task, assumptions of consistent optimality for demonstrated behaviors are often violated in practice. Finding a policy that is distributionally robust against noisy demonstrations based on an adversarial construction potentially solves this problem by avoiding optimistic generalizations of the demonstrated data. This paper studies Distributionally Robust Imitation Learning (DRoIL) and establishes a close connection between DRoIL and Maximum Entropy Inverse Reinforcement Learning. We show that DRoIL can be seen as a framework that maximizes a generalized concept of entropy.

demonstration, distributionally robust imitation learning, droil

Neural Information Processing Systems

Jan-19-2025, 05:30:51 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.42)
- Overview (0.42)

Industry:
- Education > Focused Education > Special Education (0.33)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)