Goto

Collaborating Authors

 Reinforcement Learning


Questions to Guide the Future of Artificial Intelligence Research

arXiv.org Artificial Intelligence

The field of machine learning has focused, primarily, on discretized sub-problems (i.e. vision, speech, natural language) of intelligence. While neuroscience tends to be observation heavy, providing few guiding theories. It is unlikely that artificial intelligence will emerge through only one of these disciplines. Instead, it is likely to be some amalgamation of their algorithmic and observational findings. As a result, there are a number of problems that should be addressed in order to select the beneficial aspects of both fields. In this article, we propose leading questions to guide the future of artificial intelligence research. There are clear computational principles on which the brain operates. The problem is finding these computational needles in a haystack of biological complexity. Biology has clear constraints but by not using it as a guide we are constraining ourselves.


Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning

arXiv.org Artificial Intelligence

New methodologies will be needed to ensure the airspace remains safe and efficient as traffic densities rise to accommodate new unmanned operations. This paper explores how unmanned free-flight traffic may operate in dense airspace. We develop and analyze autonomous collision avoidance systems for aircraft operating in dense airspace where traditional collision avoidance systems fail. We propose a metric for quantifying the decision burden on a collision avoidance system as well as a metric for measuring the impact of the collision avoidance system on airspace. We use deep reinforcement learning to compute corrections for an existing collision avoidance approach to account for dense airspace. The results show that a corrected collision avoidance system can operate more efficiently than traditional methods in dense airspace while maintaining high levels of safety.


Mastering Complex Control in MOBA Games with Deep Reinforcement Learning

arXiv.org Artificial Intelligence

We study the reinforcement learning problem of complex action control in the Multi-player Online Battle Arena (MOBA) 1v1 games. This problem involves far more complicated state and action spaces than those of traditional 1v1 games, such as Go and Atari series, which makes it very difficult to search any policies with human-level performance. In this paper, we present a deep reinforcement learning framework to tackle this problem from the perspectives of both system and algorithm. Our system is of low coupling and high scalability, which enables efficient explorations at large scale. Our algorithm includes several novel strategies, including control dependency decoupling, action mask, target attention, and dual-clip PPO, with which our proposed actor-critic network can be effectively trained in our system. Tested on the MOBA game Honor of Kings, the trained AI agents can defeat top professional human players in full 1v1 games. Introduction Deep reinforcement learning (DRL) has been widely used for building agents to learn complex control in competitive environments. In the competitive setting, a considerable amount of existing DRL research adopt two-agent games as the testbed, i.e., one agent versus another (1v1). Among them, Atari series and board games have been widely studied. For example, a human-level agent for playing Atari games is trained with deep Q-networks (Mnih et al. 2015). The incorporation of supervised learning and self-play into the training brings the agent to the level of beating human professionals in the game of Go (Silver et al. 2016).


Predictive Coding for Boosting Deep Reinforcement Learning with Sparse Rewards

arXiv.org Artificial Intelligence

While recent progress in deep reinforcement learning has enabled robots to learn complex behaviors, tasks with long horizons and sparse rewards remain an ongoing challenge. In this work, we propose an effective reward shaping method through predictive coding to tackle sparse reward problems. By learning predictive representations offline and using these representations for reward shaping, we gain access to reward signals that understand the structure and dynamics of the environment. In particular, our method achieves better learning by providing reward signals that 1) understand environment dynamics 2) emphasize on features most useful for learning 3) resist noise in learned representations through reward accumulation. We demonstrate the usefulness of this approach in different domains ranging from robotic manipulation to navigation, and we show that reward signals produced through predictive coding are as effective for learning as hand-crafted rewards.


Teaching robots to perceive time -- A reinforcement learning approach (Extended version)

arXiv.org Artificial Intelligence

Time perception is the phenomenological experience of time by an individual. In this paper, we study how to replicate neural mechanisms involved in time perception, allowing robots to take a step towards temporal cognition. Our framework follows a twofold biologically inspired approach. The first step consists of estimating the passage of time from sensor measurements, since environmental stimuli influence the perception of time. Sensor data is modeled as Gaussian processes that represent the second-order statistics of the natural environment. The estimated elapsed time between two events is computed from the maximum likelihood estimate of the joint distribution of the data collected between them. Moreover, exactly how time is encoded in the brain remains unknown, but there is strong evidence of the involvement of dopaminergic neurons in timing mechanisms. Since their phasic activity has a similar behavior to the reward prediction error of temporal-difference learning models, the latter are used to replicate this behavior. The second step of this approach consists therefore of applying the agent's estimate of the elapsed time in a reinforcement learning problem, where a feature representation called Microstimuli is used. We validate our framework by applying it to an experiment that was originally conducted with mice, and conclude that a robot using this framework is able to reproduce the timing mechanisms of the animal's brain.


Exploiting the potential of deep reinforcement learning for classification tasks in high-dimensional and unstructured data

arXiv.org Machine Learning

This paper presents a framework for efficiently learning feature selection policies which use less features to reach a high classification precision on large unstructured data. It uses a Deep Convolutional Autoencoder (DCAE) for learning compact feature spaces, in combination with recently-proposed Reinforcement Learning (RL) algorithms as Double DQN and Retrace.


Soft Q-network

arXiv.org Artificial Intelligence

When DQN is announced by deepmind in 2013, the whole world is surprised by the simplicity and promising result, but due to the low efficiency and stability of this method, it is hard to solve many problems. After all these years, people purposed more and more complicated ideas for improving, many of them use distributed Deep-RL which needs tons of cores to run the simulators. However, the basic ideas behind all this technique are sometimes just a modified DQN. So we asked a simple question, is there a more elegant way to improve the DQN model? Instead of adding more and more small fixes on it, we redesign the problem setting under a popular entropy regularization framework which leads to better performance and theoretical guarantee. Finally, we purposed SQN, a new off-policy algorithm with better performance and stability.


Uncertainty-sensitive Learning and Planning with Ensembles

arXiv.org Artificial Intelligence

We propose a reinforcement learning framework for discrete environments in which an agent makes both strategic and tactical decisions. The former manifests itself through the use of value function, while the latter is powered by a tree search planner. These tools complement each other. The planning module performs a local \textit{what-if} analysis, which allows to avoid tactical pitfalls and boost backups of the value function. The value function, being global in nature, compensates for inherent locality of the planner. In order to further solidify this synergy, we introduce an exploration mechanism with two distinctive components: uncertainty modelling and risk measurement. To model the uncertainty we use value function ensembles, and to reflect risk we use propose several functionals that summarize the implied by the ensemble. We show that our method performs well on hard exploration environments: Deep-sea, toy Montezuma's Revenge, and Sokoban. In all the cases, we obtain speed-up in learning and boost in performance.


On-policy Reinforcement Learning with Entropy Regularization

arXiv.org Artificial Intelligence

Entropy regularization is an imported idea in reinforcement learning, with great success in recent algorithms like Soft Actor Critic and Soft Q Network. In this work we extend this idea into the on-policy realm. With the soft gradient policy theorem, we construct the maximum entropy reinforcement learning framework for on-policy RL. For policy gradient based on-policy algorithms, policy network is often represented as Gaussian distribution with the action variance restricted to be global for all the states observed from the environment. We propose an idea called action variance scale for policy network and find it can work collaboratively with the idea of entropy regularization. In this paper, we choose the state-of-the-art on-policy algorithm, Proximal Policy Optimization, as our basal algorithm and present Soft Proximal Policy Optimization (SPPO). PPO is a popular on-policy RL algorithm with great stability and parallelism. But like many on-policy algorithm, PPO can also suffer from low sample efficiency and local optimum problem. In the entropy-regularized framework, SPPO can guide the agent to succeed at the task while maintaining exploration by acting as randomly as possible. Our method outperforms prior works on a range of continuous control benchmark tasks, Furthermore, our method can be easily extended to large scale experiment and achieve stable learning at high throughput.


Deep Reinforcement Learning for Motion Planning of Mobile Robots

arXiv.org Artificial Intelligence

This paper presents a novel motion and trajectory planning algorithm for nonholonomic mobile robots that uses recent advances in deep reinforcement learning. Starting from a random initial state, i.e., position, velocity and orientation, the robot reaches an arbitrary target state while taking both kinematic and dynamic constraints into account. Our deep reinforcement learning agent not only processes a continuous state space it also executes continuous actions, i.e., the acceleration of wheels and the adaptation of the steering angle. We evaluate our motion and trajectory planning on a mobile robot with a differential drive in a simulation environment.