This research proposes the use of imitation based learning to build collaborative strategies for a team of agents. Imitation based learning involves learning from an expert by observing her demonstrating a task and then replicating it. This mechanism makes it extremely easy for a knowledge engineer to transfer knowledge to a software agent via human demonstrations. This research aims to apply imitation to learn not only the strategy of an individual agent but also the collaborative strategy of a team of agents to achieve a common goal. The effectiveness of the proposed methodology is being assessed in the domain of RoboCup Soccer Simulation 3D which is a promising platform to address many of the complex real-world problems and offers a truly dynamic, stochastic, and partially-observable environment.
A critical flaw of existing inverse reinforcement learning (IRL) methods is their inability to significantly outperform the demonstrator. This is a consequence of the general reliance of IRL algorithms upon some form of mimicry, such as feature-count matching, rather than inferring the underlying intentions of the demonstrator that may have been poorly executed in practice. In this paper, we introduce a novel reward learning from observation algorithm, Trajectory-ranked Reward EXtrapolation (T-REX), that extrapolates beyond a set of (approximately) ranked demonstrations in order to infer high-quality reward functions from a set of potentially poor demonstrations. When combined with deep reinforcement learning, we show that this approach can achieve performance that is more than an order of magnitude better than the best-performing demonstration, on multiple Atari and MuJoCo benchmark tasks. In contrast, prior state-of-the-art imitation learning and IRL methods fail to perform better than the demonstrator and often have performance that is orders of magnitude worse than T-REX. Finally, we demonstrate that T-REX is robust to modest amounts of ranking noise and can accurately extrapolate intention by simply watching a learner noisily improve at a task over time.
Learning from demonstration (LfD) is a promising technique for instructing/teaching autonomous systems based on demonstrations from people who may have little to no experience with robots. An important aspect to LfD is the communication method used to transfer knowledge from an instructor to a robot. The communication method affects the complexity of the demonstration process for instructors, the range of tasks a robot can learn, and the learning algorithm itself. We have designed a graphical interface and an instructional language to provide an intuitive teaching system. The drawback to simplifying the teaching interface is that the resulting demonstration data are less structured, adding complexity to the learning process. This additional complexity is handled through the combination of a minimal set of predefined behaviors and a task representation capable of learning probabilistic policies over a set of behaviors. The predefined behaviors consist of finite actions a robot can perform, which act as building blocks for more complex tasks.
Munoz, J. Pablo (Brooklyn College, City University of New York) | Ozgelen, Arif T. (The Graduate Center, City University of New York) | Sklar, Elizabeth (Brooklyn College, City University of New York)
We present the initial stage of our research on Learning from Demonstration algorithms. We have implemented an algorithm based on Confident Execution, one of the components of the Confidence-Based Autonomy algorithm developed by Chernova and Veloso. Our preliminary experiments were conducted first in simulation and then using a Sony AIBO ERS-7 robot. So far, our robot has been able to learn crude navigation strategies, despite limited trials. We are currently working on improving our implementation by including additional features that describe more broadly the state of the agent. Our long term goal is to incorporate Learning from Demonstration techniques in our HRTeam (human/multi-robot) framework.
Reinforcement learning has enjoyed multiple successes in recent years. However, these successes typically require very large amounts of data before an agent achieves acceptable performance. This paper introduces a novel way of combating such requirements by leveraging existing (human or agent) knowledge. In particular, this paper uses demonstrations from agents and humans, allowing an untrained agent to quickly achieve high performance. We empirically compare with, and highlight the weakness of, HAT and CHAT, methods of transferring knowledge from a source agent/human to a target agent. This paper introduces an effective transfer approach, DRoP, combining the offline knowledge (demonstrations recorded before learning) with online confidence-based performance analysis. DRoP dynamically involves the demonstrator's knowledge, integrating it into the reinforcement learning agent's online learning loop to achieve efficient and robust learning.