Wasserstein Adversarial Imitation Learning
Xiao, Huang, Herman, Michael, Wagner, Joerg, Ziesche, Sebastian, Etesami, Jalal, Linh, Thai Hong
Imitation Learning describes the problem of recovering an expert policy from demonstrations. While inverse reinforcement learning approaches are known to be very sample-efficient in terms of expert demonstrations, they usually require problem-dependent reward functions or a (task-)specific reward-function regularization. In this paper, we show a natural connection between inverse reinforcement learning approaches and Optimal Transport, that enables more general reward functions with desirable properties (e.g., smoothness). Based on our observation, we propose a novel approach called Wasserstein Adversarial Imitation Learning. Our approach considers the Kantorovich potentials as a reward function and further leverages regularized optimal transport to enable large-scale applications. In several robotic experiments, our approach outperforms the baselines in terms of average cumulative rewards and shows a significant improvement in sample-efficiency, by requiring just one expert demonstration.
Jun-19-2019
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America > United States
- New York
- New York County > New York City (0.14)
- Richmond County > New York City (0.04)
- Queens County > New York City (0.04)
- Kings County > New York City (0.04)
- Bronx County > New York City (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- California > San Francisco County
- San Francisco (0.14)
- New York
- Europe
- Spain > Canary Islands (0.04)
- Italy > Sardinia (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.14)
- Germany > Baden-Württemberg
- Stuttgart Region > Stuttgart (0.04)
- France > Hauts-de-France
- Asia
- Middle East > Jordan (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Oceania > Australia
- Genre:
- Research Report (1.00)
- Technology: