Goto

Collaborating Authors

 Reinforcement Learning


Interactively shaping robot behaviour with unlabeled human instructions

arXiv.org Machine Learning

In this paper, we propose a framework that enables a human teacher to shape a robot behaviour by interactively providing it with unlabeled instructions. We ground the meaning of instruction signals in the task learning process, and use them simultaneously for guiding the latter. We implement our framework as a modular architecture, named TICS (Task-Instruction-Contingency-Shaping) that combines different information sources: a predefined reward function, human evaluative feedback and unlabeled instructions. This approach provides a novel perspective for robotic task learning that lies between Reinforcement Learning and Supervised Learning paradigms. We evaluate our framework both in simulation and with a real robot. The experimental results demonstrate the effectiveness of our framework in accelerating the task learning process and in reducing the amount of required teaching signals.


Learning to Learn in Simulation

arXiv.org Artificial Intelligence

Deep learning often requires the manual collection and annotation of a training set. On robotic platforms, can we partially automate this task by training the robot to be curious, i.e., to seek out beneficial training information in the environment? In this work, we address the problem of curiosity as it relates to online, real-time, human-in-the-loop training of an object detection algorithm onboard a drone, where motion is constrained to two dimensions. We use a 3D simulation environment and deep reinforcement learning to train a curiosity agent to, in turn, train the object detection model. This agent could have one of two conflicting objectives: train as quickly as possible, or train with minimal human input. We outline a reward function that allows the curiosity agent to learn either of these objectives, while taking into account some of the physical characteristics of the drone platform on which it is meant to run. In addition, We show that we can weigh the importance of achieving these objectives by adjusting a parameter in the reward function.


Learning to Schedule Communication in Multi-agent Reinforcement Learning

arXiv.org Artificial Intelligence

Many real-world reinforcement learning tasks require multiple agents to make sequential decisions under the agents' interaction, where well-coordinated actions among the agents are crucial to achieve the target goal better at these tasks. One way to accelerate the coordination effect is to enable multiple agents to communicate with each other in a distributed manner and behave as a group. In this paper, we study a practical scenario when (i) the communication bandwidth is limited and (ii) the agents share the communication medium so that only a restricted number of agents are able to simultaneously use the medium, as in the state-of-the-art wireless networking standards. This calls for a certain form of communication scheduling. In that regard, we propose a multi-agent deep reinforcement learning framework, called SchedNet, in which agents learn how to schedule themselves, how to encode the messages, and how to select actions based on received messages. SchedNet is capable of deciding which agents should be entitled to broadcasting their (encoded) messages, by learning the importance of each agent's partially observed information. We evaluate SchedNet against multiple baselines under two different applications, namely, cooperative communication and navigation, and predator-prey. Our experiments show a non-negligible performance gap between SchedNet and other mechanisms such as the ones without communication and with vanilla scheduling methods, e.g., round robin, ranging from 32% to 43%.


The Natural Language of Actions

arXiv.org Artificial Intelligence

We introduce Act2Vec, a general framework for learning context-based action representation for Reinforcement Learning. Representing actions in a vector space help reinforcement learning algorithms achieve better performance by grouping similar actions and utilizing relations between different actions. We show how prior knowledge of an environment can be extracted from demonstrations and injected into action vector representations that encode natural compatible behavior. We then use these for augmenting state representations as well as improving function approximation of Q-values. We visualize and test action embeddings in three domains including a drawing task, a high dimensional navigation task, and the large action space domain of StarCraft II.


Embodied Multimodal Multitask Learning

arXiv.org Machine Learning

Recent efforts on training visual navigation agents conditioned on language using deep reinforcement learning have been successful in learning policies for different multimodal tasks, such as semantic goal navigation and embodied question answering. In this paper, we propose a multitask model capable of jointly learning these multimodal tasks, and transferring knowledge of words and their grounding in visual objects across the tasks. The proposed model uses a novel Dual-Attention unit to disentangle the knowledge of words in the textual representations and visual concepts in the visual representations, and align them with each other. This disentangled task-invariant alignment of representations facilitates grounding and knowledge transfer across both tasks. We show that the proposed model outperforms a range of baselines on both tasks in simulated 3D environments. We also show that this disentanglement of representations makes our model modular, interpretable, and allows for transfer to instructions containing new words by leveraging object detectors.


Dynamic Planning Networks

arXiv.org Machine Learning

We introduce Dynamic Planning Networks (DPN), a novel architecture for deep reinforcement learning, that combines model-based and model-free aspects for online planning. Our architecture learns to dynamically construct plans using a learned state-transition model by selecting and traversing between simulated states and actions to maximize information before acting. In contrast to model-free methods, model-based planning lets the agent efficiently test action hypotheses without performing costly trial-and-error in the environment. DPN learns to efficiently form plans by expanding a single action-conditional state transition at a time instead of exhaustively evaluating each action, reducing the required number of state-transitions during planning by up to 96%. We observe various emergent planning patterns used to solve environments, including classical search methods such as breadth-first and depth-first search. DPN shows improved data efficiency, performance, and generalization to new and unseen domains in comparison to several baselines.


Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning

arXiv.org Artificial Intelligence

The rapid pace of research in Deep Reinforcement Learning has been driven by the presence of fast and challenging simulation environments. These environments often take the form of games; with tasks ranging from simple board games, to classic home console games, to modern strategy games. We propose a new benchmark called Obstacle Tower: a high visual fidelity, 3D, 3rd person, procedurally generated game environment. An agent in the Obstacle Tower must learn to solve both low-level control and high-level planning problems in tandem while learning from pixels and a sparse reward signal. Unlike other similar benchmarks such as the ALE, evaluation of agent performance in Obstacle Tower is based on an agent's ability to perform well on unseen instances of the environment. In this paper we outline the environment and provide a set of initial baseline results produced by current state-of-the-art Deep RL methods as well as human players. In all cases these algorithms fail to produce agents capable of performing anywhere near human level on a set of evaluations designed to test both memorization and generalization ability. As such, we believe that the Obstacle Tower has the potential to serve as a helpful Deep RL benchmark now and into the future.


Intelligent Traffic Signal Control: Using Reinforcement Learning with Partial Detection

arXiv.org Artificial Intelligence

Intelligent Transportation Systems (ITS) have attracted the attention of researchers and the general public alike as a means to alleviate traffic congestion. Recently, the maturity of wireless technology has enabled a cost-efficient way to achieve ITS by detecting vehicles using Vehicle to Infrastructure (V2I) communications. Traditional ITS algorithms, in most cases, assume that every vehicle is observed, such as by a camera or a loop detector, but a V2I implementation would detect only those vehicles with wireless communications capability. We examine a family of transportation systems, which we will refer to as `Partially Detected Intelligent Transportation Systems'. An algorithm that can act well under a small detection rate is highly desirable due to gradual penetration rates of the underlying wireless technologies such as Dedicated Short Range Communications (DSRC) technology. Artificial Intelligence (AI) techniques for Reinforcement Learning (RL) are suitable tools for finding such an algorithm due to utilizing varied inputs and not requiring explicit analytic understanding or modeling of the underlying system dynamics. In this paper, we report a RL algorithm for partially observable ITS based on DSRC. The performance of this system is studied under different car flows, detection rates, and topologies of the road network. Our system is able to efficiently reduce the average waiting time of vehicles at an intersection, even with a low detection rate.


Study of Robust Distributed Diffusion RLS Algorithms with Side Information for Adaptive Networks

arXiv.org Machine Learning

This work develops robust diffusion recursive least squares algorithms to mitigate the performance degradation often experienced in networks of agents in the presence of impulsive noise. The first algorithm minimizes an exponentially weighted least-squares cost function subject to a time-dependent constraint on the squared norm of the intermediate update at each node. A recursive strategy for computing the constraint is proposed using side information from the neighboring nodes to further improve the robustness. We also analyze the mean-square convergence behavior of the proposed algorithm. The second proposed algorithm is a modification of the first one based on the dichotomous coordinate descent iterations. It has a performance similar to that of the former, however its complexity is significantly lower especially when input regressors of agents have a shift structure and it is well suited to practical implementation. Simulations show the superiority of the proposed algorithms over previously reported techniques in various impulsive noise scenarios.


Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics

arXiv.org Machine Learning

Scaling end-to-end reinforcement learning to control real robots from vision presents a series of challenges, in particular in terms of sample efficiency. Against end-to-end learning, state representation learning can help learn a compact, efficient and relevant representation of states that speeds up policy learning, reducing the number of samples needed, and that is easier to interpret. We evaluate several state representation learning methods on goal based robotics tasks and propose a new unsupervised model that stacks representations and combines strengths of several of these approaches. This method encodes all the relevant features, performs on par or better than end-to-end learning, and is robust to hyper-parameters change.