VILD: Variational Imitation Learning with Diverse-quality Demonstrations
Tangkaratt, Voot, Han, Bo, Khan, Mohammad Emtiyaz, Sugiyama, Masashi
The goal of imitation learning (IL) is to learn a good policy from high-quality demonstrations. However, the quality of demonstrations in reality can be diverse, since it is easier and cheaper to collect demonstrations from a mix of experts and amateurs. IL in such situations can be challenging, especially when the level of demonstrators' expertise is unknown. We propose a new IL method called \underline{v}ariational \underline{i}mitation \underline{l}earning with \underline{d}iverse-quality demonstrations (VILD), where we explicitly model the level of demonstrators' expertise with a probabilistic graphical model and estimate it along with a reward function. We show that a naive approach to estimation is not suitable to large state and action spaces, and fix its issues by using a variational approach which can be easily implemented using existing reinforcement learning methods. Experiments on continuous-control benchmarks demonstrate that VILD outperforms state-of-the-art methods. Our work enables scalable and data-efficient IL under more realistic settings than before.
Sep-15-2019
- Country:
- Asia
- Japan > Honshū (0.14)
- Middle East > Israel (0.14)
- Europe (0.67)
- North America > United States
- Colorado (0.14)
- Asia
- Genre:
- Research Report (1.00)
- Industry:
- Automobiles & Trucks (0.67)
- Transportation > Ground
- Road (0.67)
- Technology: