VILD: Variational Imitation Learning with Diverse-quality Demonstrations

Tangkaratt, Voot, Han, Bo, Khan, Mohammad Emtiyaz, Sugiyama, Masashi

Sep-15-2019–arXiv.org Machine Learning

The goal of imitation learning (IL) is to learn a good policy from high-quality demonstrations. However, the quality of demonstrations in reality can be diverse, since it is easier and cheaper to collect demonstrations from a mix of experts and amateurs. IL in such situations can be challenging, especially when the level of demonstrators' expertise is unknown. We propose a new IL method called \underline{v}ariational \underline{i}mitation \underline{l}earning with \underline{d}iverse-quality demonstrations (VILD), where we explicitly model the level of demonstrators' expertise with a probabilistic graphical model and estimate it along with a reward function. We show that a naive approach to estimation is not suitable to large state and action spaces, and fix its issues by using a variational approach which can be easily implemented using existing reinforcement learning methods. Experiments on continuous-control benchmarks demonstrate that VILD outperforms state-of-the-art methods. Our work enables scalable and data-efficient IL under more realistic settings than before.

deep learning, demonstration, neural network, (22 more...)

arXiv.org Machine Learning

Sep-15-2019

arXiv.org PDF

Add feedback

Country:
- Asia
  - Japan > Honshū (0.14)
  - Middle East > Israel (0.14)
- Europe (0.67)
- North America > United States
  - Colorado (0.14)

Genre:
- Research Report (1.00)

Industry:
- Automobiles & Trucks (0.67)
- Transportation > Ground
  - Road (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.93)
    - Reinforcement Learning (1.00)
    - Statistical Learning (0.93)
  - Robots (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found