Goto

Collaborating Authors

Adversarial Task Transfer from Preference

arXiv.org Machine Learning

Task transfer is extremely important for reinforcement learning, since it provides possibility for generalizing to new tasks. One main goal of task transfer in reinforcement learning is to transfer the action policy of an agent from the original basic task to specific target task. Existing work to address this challenging problem usually requires accurate hand-coded cost functions or rich demonstrations on the target task. This strong requirement is difficult, if not impossible, to be satisfied in many practical scenarios. In this work, we develop a novel task transfer framework which effectively performs the policy transfer using preference only. The hidden cost model for preference and adversarial training are elegantly combined to perform the task transfer. We give the theoretical analysis on the convergence about the proposed algorithm, and perform extensive simulations on some well-known examples to validate the theoretical results.


Energy-Based Imitation Learning

arXiv.org Machine Learning

We tackle a common scenario in imitation learning (IL), where agents try to recover the optimal policy from expert demonstrations without further access to the expert or environment reward signals. The classical inverse reinforcement learning (IRL) solution involves bi-level optimization and is of high computational cost. Recent generative adversarial methods formulate the IL problem as occupancy measure matching, which, however, suffer from the notorious training instability and mode-dropping problems. Inspired by recent progress in energy-based model (EBM), in this paper, we propose a novel IL framework named Energy-Based Imitation Learning (EBIL), solving the IL problem via directly estimating the expert energy as the surrogate reward function through score matching. EBIL combines the idea of both EBM and occupancy measure matching, which enjoys: (1) high model flexibility for expert policy distribution estimation; (2) efficient computation that avoids the previous alternate training fashion. Though motivated by matching the policy between the expert and the agent, we surprisingly find a nontrivial connection between EBIL and Max-Entropy IRL (MaxEnt IRL) approaches, and further show that EBIL can be seen as a simpler and more efficient solution of MaxEnt IRL, which support flexible and general candidates on training the expert's EBM. Extensive experiments show that EBIL can always achieve comparable or better performance against SoTA IL methods.


Lifelong Inverse Reinforcement Learning

Neural Information Processing Systems

Methods for learning from demonstration (LfD) have shown success in acquiring behavior policies by imitating a user. However, even for a single task, LfD may require numerous demonstrations. For versatile agents that must learn many tasks via demonstration, this process would substantially burden the user if each task were learned in isolation. To address this challenge, we introduce the novel problem of lifelong learning from demonstration, which allows the agent to continually build upon knowledge learned from previously demonstrated tasks to accelerate the learning of new tasks, reducing the amount of demonstrations required. As one solution to this problem, we propose the first lifelong learning approach to inverse reinforcement learning, which learns consecutive tasks via demonstration, continually transferring knowledge between tasks to improve performance.


Lifelong Inverse Reinforcement Learning

Neural Information Processing Systems

Methods for learning from demonstration (LfD) have shown success in acquiring behavior policies by imitating a user. However, even for a single task, LfD may require numerous demonstrations. For versatile agents that must learn many tasks via demonstration, this process would substantially burden the user if each task were learned in isolation. To address this challenge, we introduce the novel problem of lifelong learning from demonstration, which allows the agent to continually build upon knowledge learned from previously demonstrated tasks to accelerate the learning of new tasks, reducing the amount of demonstrations required. As one solution to this problem, we propose the first lifelong learning approach to inverse reinforcement learning, which learns consecutive tasks via demonstration, continually transferring knowledge between tasks to improve performance.


Maximum Entropy Semi-Supervised Inverse Reinforcement Learning

AAAI Conferences

A popular approach to apprenticeship learning (AL) is to formulate it as an inverse reinforcement learning (IRL) problem. The MaxEnt-IRL algorithm successfully integrates the maximum entropy principle into IRL and unlike its predecessors, it resolves the ambiguity arising from the fact that a possibly large number of policies could match the expert's behavior. In this paper, we study an AL setting in which in addition to the expert's trajectories,a number of unsupervised trajectories is available. We introduce MESSI,a novel algorithm that combines MaxEnt-IRL with principles coming from semisupervised learning. In particular, MESSI integrates the unsupervised data into the MaxEnt-IRL framework using a pairwise penalty on trajectories. Empirical results in a highway driving and grid-world problems indicate that MESSI is able to take advantage of the unsupervised trajectories and improve the performance of MaxEnt-IRL.