Reinforcement Learning
Using State Predictions for Value Regularization in Curiosity Driven Deep Reinforcement Learning
Brunner, Gino, Fritsche, Manuel, Richter, Oliver, Wattenhofer, Roger
Learning in sparse reward settings remains a challenge in Reinforcement Learning, which is often addressed by using intrinsic rewards. One promising strategy is inspired by human curiosity, requiring the agent to learn to predict the future. In this paper a curiosity-driven agent is extended to use these predictions directly for training. To achieve this, the agent predicts the value function of the next state at any point in time. Subsequently, the consistency of this prediction with the current value function is measured, which is then used as a regularization term in the loss function of the algorithm. Experiments were made on grid-world environments as well as on a 3D navigation task, both with sparse rewards. In the first case the extended agent is able to learn significantly faster than the baselines.
Reinforcement Learning in R
Prรถllochs, Nicolas, Feuerriegel, Stefan
Reinforcement learning refers to a group of methods from artificial intelligence where an agent performs learning through trial and error. It differs from supervised learning, since reinforcement learning requires no explicit labels; instead, the agent interacts continuously with its environment. That is, the agent starts in a specific state and then performs an action, based on which it transitions to a new state and, depending on the outcome, receives a reward. Different strategies (e.g. Q-learning) have been proposed to maximize the overall reward, resulting in a so-called policy, which defines the best possible action in each state. Mathematically, this process can be formalized by a Markov decision process and it has been implemented by packages in R; however, there is currently no package available for reinforcement learning. As a remedy, this paper demonstrates how to perform reinforcement learning in R and, for this purpose, introduces the ReinforcementLearning package. The package provides a remarkably flexible framework and is easily applied to a wide range of different problems. We demonstrate its use by drawing upon common examples from the literature (e.g. finding optimal game strategies).
M^3RL: Mind-aware Multi-agent Management Reinforcement Learning
Most of the prior work on multi-agent reinforcement learning (MARL) achieves optimal collaboration by directly controlling the agents to maximize a common reward. In this paper, we aim to address this from a different angle. In particular, we consider scenarios where there are self-interested agents (i.e., worker agents) which have their own minds (preferences, intentions, skills, etc.) and can not be dictated to perform tasks they do not wish to do. For achieving optimal coordination among these agents, we train a super agent (i.e., the manager) to manage them by first inferring their minds based on both current and past observations and then initiating contracts to assign suitable tasks to workers and promise to reward them with corresponding bonuses so that they will agree to work together. The objective of the manager is maximizing the overall productivity as well as minimizing payments made to the workers for ad-hoc worker teaming. RL), which consists of agent modeling and policy learning. We have evaluated our approach in two environments, Resource Collection and Crafting, to simulate multi-agent management problems with various task settings and multiple designs for the worker agents. The experimental results have validated the effectiveness of our approach in modeling worker agents' minds online, and in achieving optimal ad-hoc teaming with good generalization and fast adaptation. As the main assumption and building block in economy, self-interested agents play a central roles in our daily life. Selfish agents, with their private beliefs, preferences, intentions, and skills, could collaborate effectively to make great achievement with proper incentives and contracts, an amazing phenomenon that happens every day in every corner of the world.
Artificial Intelligence: What Is Reinforcement Learning - A Simple Explanation & Practical Examples
Reinforcement learning is one of the most discussed, followed and contemplated topics in artificial intelligence (AI) as it has the potential to transform most businesses. In this article, I want to provide a simple guide that explains reinforcement learning and give you some practical examples of how it is used today. At the core of reinforcement learning is the concept that the optimal behavior or action is reinforced by a positive reward. Similar to toddlers learning how to walk who adjust actions based on the outcomes they experience such as taking a smaller step if the previous broad step made them fall, machines and software agents use reinforcement learning algorithms to determine the ideal behavior based upon feedback from the environment. Depending on the complexity of the problem, reinforcement learning algorithms can keep adapting to the environment over time if necessary in order to maximize the reward in the long-term.
How do some of the best AI algorithms perform on real robots? Not well, it turns out
All the biggest labs leading AI research will have you believe that their fancy game-playing software bots will one day be applicable to the real world. The skills from playing Go, Poker or Dota 2 will be transferable to algorithms designing new drugs, controlling robots, teaching computers how to negotiate โ you name it. One startup, Kindred.AI, decided to put some of those claims to the test, particularly with respect to robotics and machine-learning software. "We wanted to know the readiness of the state-of-the-art RL algorithms on real robotic applications," Rupam Mahmood, lead of AI at Kindred, told The Register. Reinforcement learning (RL) is a popular AI method that teaches agents how to perform a specific task by rewarding them every time they get closer to the stated goal.
Generalization and Regularization in DQN
Farebrother, Jesse, Machado, Marlos C., Bowling, Michael
Deep reinforcement learning (RL) algorithms have shown an impressive ability to learn complex control policies in high-dimensional environments. However, despite the ever-increasing performance on popular benchmarks like the Arcade Learning Environment (ALE), policies learned by deep RL algorithms can struggle to generalize when evaluated in remarkably similar environments. These results are unexpected given the fact that, in supervised learning, deep neural networks often learn robust features that generalize across tasks. In this paper, we study the generalization capabilities of DQN in order to aid in understanding this mismatch between generalization in deep RL and supervised learning methods. We provide evidence suggesting that DQN overspecializes to the domain it is trained on. We then comprehensively evaluate the impact of traditional methods of regularization from supervised learning, $\ell_2$ and dropout, and of reusing learned representations to improve the generalization capabilities of DQN. We perform this study using different game modes of Atari 2600 games, a recently introduced modification for the ALE which supports slight variations of the Atari 2600 games used for benchmarking in the field. Despite regularization being largely underutilized in deep RL, we show that it can, in fact, help DQN learn more general features. These features can then be reused and fine-tuned on similar tasks, considerably improving the sample efficiency of DQN.
Switching Isotropic and Directional Exploration with Parameter Space Noise in Deep Reinforcement Learning
Karino, Izumi, Tanaka, Kazutoshi, Niiyama, Ryuma, Kuniyoshi, Yasuo
This paper proposes an exploration method for deep reinforcement learning based on parameter space noise. Recent studies have experimentally shown that parameter space noise results in better exploration than the commonly used action space noise. Previous methods devised a way to update the diagonal covariance matrix of a noise distribution and did not consider the direction of the noise vector and its correlation. In addition, fast updates of the noise distribution are required to facilitate policy learning. We propose a method that deforms the noise distribution according to the accumulated returns and the noises that have led to the returns. Moreover, this method switches isotropic exploration and directional exploration in parameter space with regard to obtained rewards. We validate our exploration strategy in the OpenAI Gym continuous environments and modified environments with sparse rewards. The proposed method achieves results that are competitive with a previous method at baseline tasks. Moreover, our approach exhibits better performance in sparse reward environments by exploration with the switching strategy.
Learning to Coordinate Multiple Reinforcement Learning Agents for Diverse Query Reformulation
Nogueira, Rodrigo, Bulian, Jannis, Ciaramita, Massimiliano
We propose a method to efficiently learn diverse strategies in reinforcement learning for query reformulation in the tasks of document retrieval and question answering. In the proposed framework an agent consists of multiple specialized sub-agents and a meta-agent that learns to aggregate the answers from sub-agents to produce a final answer. Sub-agents are trained on disjoint partitions of the training data, while the meta-agent is trained on the full training set. Our method makes learning faster, because it is highly parallelizable, and has better generalization performance than strong baselines, such as an ensemble of agents trained on the full data. We show that the improved performance is due to the increased diversity of reformulation strategies.
Learning and Planning with a Semantic Model
Wu, Yi, Wu, Yuxin, Tamar, Aviv, Russell, Stuart, Gkioxari, Georgia, Tian, Yuandong
Building deep reinforcement learning agents that can generalize and adapt to unseen environments remains a fundamental challenge for AI. This paper describes progresses on this challenge in the context of man-made environments, which are visually diverse but contain intrinsic semantic regularities. We propose a hybrid model-based and model-free approach, LEArning and Planning with Semantics (LEAPS), consisting of a multi-target sub-policy that acts on visual inputs, and a Bayesian model over semantic structures. When placed in an unseen environment, the agent plans with the semantic model to make high-level decisions, proposes the next sub-target for the sub-policy to execute, and updates the semantic model based on new observations. We perform experiments in visual navigation tasks using House3D, a 3D environment that contains diverse human-designed indoor scenes with real-world objects. LEAPS outperforms strong baselines that do not explicitly plan using the semantic content. Deep reinforcement learning (DRL) has undoubtedly witnessed strong achievements in recent years (Silver et al., 2016; Mnih et al., 2015; Levine et al., 2016).