Reparameterized Variational Divergence Minimization for Stable Imitation

Arumugam, Dilip, Dey, Debadeepta, Agarwal, Alekh, Celikyilmaz, Asli, Nouri, Elnaz, Dolan, Bill

Jun-18-2020–arXiv.org Machine Learning

While recent state-of-the-art results for adversarial imitation-learning algorithms are encouraging, recent works exploring the imitation learning from observation (ILO) setting, where trajectories \textit{only} contain expert observations, have not been met with the same success. Inspired by recent investigations of $f$-divergence manipulation for the standard imitation learning setting(Ke et al., 2019; Ghasemipour et al., 2019), we here examine the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms. We unfortunately find that $f$-divergence minimization through reinforcement learning is susceptible to numerical instabilities. We contribute a reparameterization trick for adversarial imitation learning to alleviate the optimization challenges of the promising $f$-divergence minimization framework. Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.

neural network, reparameterized variational divergence minimization, survey article, (13 more...)

arXiv.org Machine Learning

Jun-18-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Santa Clara County (0.14)

Genre:
- Research Report (0.64)

Industry:
- Education (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks (1.00)
    - Reinforcement Learning (1.00)
    - Statistical Learning (0.93)
  - Robots (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found