Goto

Collaborating Authors

 exponentially weighted imitation learning


Exponentially Weighted Imitation Learning for Batched Historical Data

Neural Information Processing Systems

We consider deep policy learning with only batched historical trajectories. The main challenge of this problem is that the learner no longer has a simulator or ``environment oracle'' as in most reinforcement learning settings. To solve this problem, we propose a monotonic advantage reweighted imitation learning strategy that is applicable to problems with complex nonlinear function approximation and works well with hybrid (discrete and continuous) action space. The method does not rely on the knowledge of the behavior policy, thus can be used to learn from data generated by an unknown policy. Under mild conditions, our algorithm, though surprisingly simple, has a policy improvement bound and outperforms most competing methods empirically. Thorough numerical results are also provided to demonstrate the efficacy of the proposed methodology.


Reviews: Exponentially Weighted Imitation Learning for Batched Historical Data

Neural Information Processing Systems

A method for learning deep policies from data recorded in demonstrations is introduced. The method uses exponentially weighted learning that can learn policies from data generated by another policy The proposed approach is interesting and well presented. Theat would be even more interesting than this presented imitation learning scheme, however the paper gives the introduction, background and discussion for that future work. How is generated the data for the HFO environment? Why is not used PG, PGIS in the experiments with Torcs and king of Glory?


Exponentially Weighted Imitation Learning for Batched Historical Data

Neural Information Processing Systems

We consider deep policy learning with only batched historical trajectories. The main challenge of this problem is that the learner no longer has a simulator or environment oracle'' as in most reinforcement learning settings. To solve this problem, we propose a monotonic advantage reweighted imitation learning strategy that is applicable to problems with complex nonlinear function approximation and works well with hybrid (discrete and continuous) action space. The method does not rely on the knowledge of the behavior policy, thus can be used to learn from data generated by an unknown policy. Under mild conditions, our algorithm, though surprisingly simple, has a policy improvement bound and outperforms most competing methods empirically.