Goto

Collaborating Authors

 Reinforcement Learning


From Crystallized Adaptivity to Fluid Adaptivity in Deep Reinforcement Learning -- Insights from Biological Systems on Adaptive Flexibility

arXiv.org Artificial Intelligence

Recent developments in machine-learning algorithms have led to impressive performance increases in many traditional application scenarios of artificial intelligence research. In the area of deep reinforcement learning, deep learning functional architectures are combined with incremental learning schemes for sequential tasks that include interaction-based, but often delayed feedback. Despite their impressive successes, modern machine-learning approaches, including deep reinforcement learning, still perform weakly when compared to flexibly adaptive biological systems in certain naturally occurring scenarios. Such scenarios include transfers to environments different than the ones in which the training took place or environments that dynamically change, both of which are often mastered by biological systems through a capability that we here term "fluid adaptivity" to contrast it from the much slower adaptivity ("crystallized adaptivity") of the prior learning from which the behavior emerged. In this article, we derive and discuss research strategies, based on analyzes of fluid adaptivity in biological systems and its neuronal modeling, that might aid in equipping future artificially intelligent systems with capabilities of fluid adaptivity more similar to those seen in some biologically intelligent systems. A key component of this research strategy is the dynamization of the problem space itself and the implementation of this dynamization by suitably designed flexibly interacting modules.


Learn How to Cook a New Recipe in a New House: Using Map Familiarization, Curriculum Learning, and Common Sense to Learn Families of Text-Based Adventure Games

arXiv.org Artificial Intelligence

We consider the task of learning to play families of text-based computer adventure games, i.e., fully textual environments with a common theme (e.g. cooking) and goal (e.g. prepare a meal from a recipe) but with different specifics; new instances of such games are relatively straightforward for humans to master after a brief exposure to the genre but have been curiously difficult for computer agents to learn. We find that the deep Q-learning strategies that have been successfully leveraged for superhuman performance in single-instance action video games can be applied to learn families of text video games when adopting simple strategies that correlate with human-like learning behavior. Specifically, we build agents that learn to tackle simple scenarios before more complex ones (curriculum learning), that are equipped with the contextualized semantics of BERT (and we demonstrate that this provides a measure of common sense), and that familiarize themselves in an unfamiliar environment by navigating before acting. We demonstrate faster training convergence and improved task completion rates over reasonable baselines.


Reinforcement Learning based Interconnection Routing for Adaptive Traffic Optimization

arXiv.org Artificial Intelligence

Applying Machine Learning (ML) techniques to design and optimize computer architectures is a promising research direction. Optimizing the runtime performance of a Network-on-Chip (NoC) necessitates a continuous learning framework. In this work, we demonstrate the promise of applying reinforcement learning (RL) to optimize NoC runtime performance. We present three RL-based methods for learning optimal routing algorithms. The experimental results show the algorithms can successfully learn a near-optimal solution across different environment states. Reproducible Code: github.com/huckiyang/interconnect-routing-gym


Behaviour Suite for Reinforcement Learning

arXiv.org Artificial Intelligence

This paper introduces the Behaviour Suite for Reinforcement Learning, or bsuite for short. bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives. First, to collect clear, informative and scalable problems that capture key issues in the design of general and efficient learning algorithms. Second, to study agent behaviour through their performance on these shared benchmarks. To complement this effort, we open source github.com/deepmind/bsuite, which automates evaluation and analysis of any agent on bsuite. This library facilitates reproducible and accessible research on the core issues in RL, and ultimately the design of superior learning algorithms. Our code is Python, and easy to use within existing projects. We include examples with OpenAI Baselines, Dopamine as well as new reference implementations. Going forward, we hope to incorporate more excellent experiments from the research community, and commit to a periodic review of bsuite from a committee of prominent researchers.


Superstition in the Network: Deep Reinforcement Learning Plays Deceptive Games

arXiv.org Artificial Intelligence

Deep reinforcement learning has learned to play many games well, but failed on others. To better characterize the modes and reasons of failure of deep reinforcement learners, we test the widely used Asynchronous Actor-Critic (A2C) algorithm on four deceptive games, which are specially designed to provide challenges to game-playing agents. These games are implemented in the General Video Game AI framework, which allows us to compare the behavior of reinforcement learning-based agents with planning agents based on tree search. We find that several of these games reliably deceive deep reinforcement learners, and that the resulting behavior highlights the shortcomings of the learning algorithm. The particular ways in which agents fail differ from how planning-based agents fail, further illuminating the character of these algorithms. We propose an initial typology of deceptions which could help us better understand pitfalls and failure modes of (deep) reinforcement learning. Introduction In reinforcement learning (RL) (Sutton and Barto 1998) an agent is tasked with learning a policy that maximizes expected reward based only on its interactions with the environment. In general, there is no guarantee that any such procedure will lead to an optimal policy; while convergence proofs exist, they only apply to a tiny and rather uninteresting class of environments. Reinforcement learning still performs well for a wide range of scenarios not covered by those convergence proofs. However, while recent successes in game-playing with deep reinforcement learning (Justesen et al. 2017) have led to a high degree of confidence in the deep RL approach, there are still scenarios or games where deep RL fails. Some oft-mentioned reasons why RL algorithms fail are partial observability and long time spans between actions and rewards. But are there other causes?


A Review of Cooperative Multi-Agent Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Deep Reinforcement Learning has made significant progress in multi-agent systems in recent years. In this review article, we have mostly focused on recent papers on Multi-Agent Reinforcement Learning (MARL) than the older papers, unless it was necessary. Several ideas and papers are proposed with different notations, and we tried our best to unify them with a single notation and categorize them by their relevance. In particular, we have focused on five common approaches on modeling and solving multi-agent reinforcement learning problems: (I) independent-learners, (II) fully observable critic, (III) value function decomposition, (IV) consensus, (IV) learn to communicate. Moreover, we discuss some new emerging research areas in MARL along with the relevant recent papers. In addition, some of the recent applications of MARL in real world are discussed. Finally, a list of available environments for MARL research are provided and the paper is concluded with proposals on the possible research directions.


Learning to Explore in Motion and Interaction Tasks

arXiv.org Artificial Intelligence

-- Model free reinforcement learning suffers from the high sampling complexity inherent to robotic manipulation or locomotion tasks. Most successful approaches typically use random sampling strategies which leads to slow policy convergence. In this paper we present a novel approach for efficient exploration that leverages previously learned tasks. We exploit the fact that the same system is used across many tasks and build a generative model for exploration based on data from previously solved tasks to improve learning new tasks. The approach also enables continuous learning of improved exploration strategies as novel tasks are learned. Extensive simulations on a robot manipulator performing a variety of motion and contact interaction tasks demonstrate the capabilities of the approach. In particular, our experiments suggest that the exploration strategy can more than double learning speed, especially when rewards are sparse. Moreover, the algorithm is robust to task variations and parameter tuning, making it beneficial for complex robotic problems. I. INTRODUCTION Deep reinforcement learning has attracted a lot of attention for robotic applications where full robot models can be difficult to identify, especially for contact dynamics, and lead to computationally challenging planning and control problems.


Large-scale Traffic Signal Control Using a Novel Multi-Agent Reinforcement Learning

arXiv.org Machine Learning

Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multi-Agent Reinforcement Learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this paper, a new MARL, called Cooperative double Q-learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double Q-learning method based on double estimators and the UCB policy, which can eliminate the over-estimation problem existing in traditional independent Q-learning while ensuring exploration. It uses mean field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied on TSC and tested on a multi-traffic signal simulator. According to the results obtained on several traffic scenarios, Co- DQL outperforms several state-of-the-art decentralized MARL algorithms. It can effectively shorten the average waiting time of the vehicles in the whole road system.


Wasserstein Robust Reinforcement Learning

arXiv.org Artificial Intelligence

Reinforcement learning algorithms, though successful, tend to over-fit to training environments hampering their application to the real-world. This paper proposes WR$^{2}$L; a robust reinforcement learning algorithm with significant robust performance on low and high-dimensional control tasks. Our method formalises robust reinforcement learning as a novel min-max game with a Wasserstein constraint for a correct and convergent solver. Apart from the formulation, we also propose an efficient and scalable solver following a novel zero-order optimisation method that we believe can be useful to numerical optimisation in general. We contribute both theoretically and empirically. On the theory side, we prove that WR$^{2}$L converges to a stationary point in the general setting of continuous state and action spaces. Empirically, we demonstrate significant gains compared to standard and robust state-of-the-art algorithms on high-dimensional MuJuCo environments.


Learning to Grasp from 2.5D images: a Deep Reinforcement Learning Approach

arXiv.org Machine Learning

--In this paper, we propose a deep reinforcement learning (DRL) solution to the grasping problem using 2.5D images as the only source of information. In particular, we developed a simulated environment where a robot equipped with a vacuum gripper has the aim of reaching blocks with planar surfaces. These blocks can have different dimensions, shapes, position and orientation. The experiments demonstrated the effectiveness of the proposed DRL algorithm applied to grasp tasks guided by visual depth camera inputs. When using the proper policy, the proposed method estimates a robot tool configuration that reaches the object surface with negligible position and orientation errors. This is, to the best of our knowledge, the first successful attempt of using 2.5D images only as of the input of a DRL algorithm, to solve the grasping problem regressing 3D world coordinates. I. INTRODUCTION In industrial environments, manipulator robots are usually designed to solve precise and predefined tasks. However, there are situations where it may be required to generalize the behaviour of the robots due to variations of size, shape, position, and orientation of the object to grasp. In these cases, the development of solutions according to mainstream standard computer vision and robotic control approaches can be complex and may lead to customized algorithms that cannot be easily generalized to different scenarios. Deep Reinforcement Learning addresses this task by merging the reinforcement learning and the deep learning domains, approximating the policy to learn with a deep neural network.