Distributionally Robust Imitation Learning
–Neural Information Processing Systems
We consider the imitation learning problem of learning a policy in a Markov Decision Process (MDP) setting where the reward function is not given, but demonstrations from experts are available. Although the goal of imitation learning is to learn a policy that produces behaviors nearly as good as the experts' for a desired task, assumptions of consistent optimality for demonstrated behaviors are often violated in practice. Finding a policy that is distributionally robust against noisy demonstrations based on an adversarial construction potentially solves this problem by avoiding optimistic generalizations of the demonstrated data.
Neural Information Processing Systems
May-23-2025, 05:12:22 GMT
- Country:
- North America > United States > Illinois (0.14)
- Genre:
- Research Report (0.93)
- Industry:
- Education (0.34)