Goto

Collaborating Authors

 Reinforcement Learning


Artificial Intelligence: Reinforcement Learning in Python

@machinelearnbot

When people talk about artificial intelligence, they usually don't mean supervised and unsupervised machine learning. These tasks are pretty trivial compared to what we think of AIs doing - playing chess and Go, driving cars, and beating video games at a superhuman level. Reinforcement learning has recently become popular for doing all of that and more. Much like deep learning, a lot of the theory was discovered in the 70s and 80s but it hasn't been until recently that we've been able to observe first hand the amazing results that are possible. In 2016 we saw Google's AlphaGo beat the world Champion in Go.


Universal Successor Representations for Transfer Reinforcement Learning

arXiv.org Machine Learning

The objective of transfer reinforcement learning is to generalize from a set of previous tasks to unseen new tasks. In this work, we focus on the transfer scenario where the dynamics among tasks are the same, but their goals differ. Although general value function (Sutton et al., 2011) has been shown to be useful for knowledge transfer, learning a universal value function can be challenging in practice. To attack this, we propose (1) to use universal successor representations (USR) to represent the transferable knowledge and (2) a USR approximator (USRA) that can be trained by interacting with the environment. Our experiments show that USR can be effectively applied to new tasks, and the agent initialized by the trained USRA can achieve the goal considerably faster than random initialization.


TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent

arXiv.org Machine Learning

In this paper, we introduce a method for adapting the step-sizes of temporal difference (TD) learning. The performance of TD methods often depends on well chosen step-sizes, yet few algorithms have been developed for setting the step-size automatically for TD learning. An important limitation of current methods is that they adapt a single step-size shared by all the weights of the learning system. A vector step-size enables greater optimization by specifying parameters on a per-feature basis. Furthermore, adapting parameters at different rates has the added benefit of being a simple form of representation learning. We generalize Incremental Delta Bar Delta (IDBD)---a vectorized adaptive step-size method for supervised learning---to TD learning, which we name TIDBD. We demonstrate that TIDBD is able to find appropriate step-sizes in both stationary and non-stationary prediction tasks, outperforming ordinary TD methods and TD methods with scalar step-size adaptation; we demonstrate that it can differentiate between features which are relevant and irrelevant for a given task, performing representation learning; and we show on a real-world robot prediction task that TIDBD is able to outperform ordinary TD methods and TD methods augmented with AlphaBound and RMSprop.


Advanced Neural Networks with Tensorflow Udemy

@machinelearnbot

Neural Networks are at the forefront of almost all recent major technology breakthroughs. The intersection of big data, parallel programming, and AI generated a new wave of Neural Network research. In this course, you will be taken through some of the best uses of Neural Networks using TensorFlow. You'll explore Deep Reinforcement Learning algorithms such as Generative Networks and Deep Q Learning. You will learn to implement some more complex types of neural networks such as Deep Q Learning with OpenAI Gym, autoencoders, and Siamese neural networks.


Latent Space Policies for Hierarchical Reinforcement Learning

arXiv.org Machine Learning

We address the problem of learning hierarchical deep neural network policies for reinforcement learning. Our aim is to design a hierarchical reinforcement learning algorithm that can construct hierarchical representations in bottom-up layerwise fashion. In contrast to methods that explicitly restrict or cripple lower layers of a hierarchy to force them to use higher-level modulating signals, each layer in our framework is trained to directly solve the task, but acquires a range of diverse strategies via a maximum entropy reinforcement learning objective. Each layer is also augmented with latent random variables, which are sampled from a prior distribution during the training of that layer. The maximum entropy objective causes these latent variables to be incorporated into the layer's policy, and the higher level layer can directly control the behavior of the lower layer through this latent space. Furthermore, by constraining the mapping from latent variables to actions to be invertible, higher layers retain full expressivity: neither the higher layers nor the lower layers are constrained in their behavior. Our experimental evaluation demonstrates that we can improve on the performance of single-layer policies on standard benchmark tasks simply by adding additional layers, and that our method can solve more complex sparse-reward tasks by learning higher-level policies on top of high-entropy skills optimized for simple low-level objectives.


Artificial Intelligence IV - Reinforcement Learning in Java

@machinelearnbot

This course is about Reinforcement Learning. The first step is to talk about the mathematical background: we can use a Markov Decision Process as a model for reinforcement learning. We can solve the problem 3 ways: value-iteration, policy-iteration and Q-learning. Q-learning is a model free approach so it is state-of-the-art approach. It learns the optimal policy by interacting with the environment.


Simple Reinforcement Learning with Tensorflow: Part 2 - Policy-based Agents

#artificialintelligence

After a weeklong break, I am back again with part 2 of my Reinforcement Learning tutorial series. In Part 1, I had shown how to put together a basic agent that learns to choose the more rewarding of two possible options. In this post, I am going to describe how we get from that simple agent to one that is capable of taking in an observation of the world, and taking actions which provide the optimal reward not just in the present, but over the long run. With these additions, we will have a full reinforcement agent. Environments which pose the full problem to an agent are referred to as Markov Decision Processes (MDPs).


University of Warsaw researchers and deepsense.ai launch reinforcement learning project powered by Google's TensorFlow Research Cloud deepsense.ai Press Center

@machinelearnbot

Researchers from the University of Warsaw, Google AI and deepsense.ai The goal of the experiment is to end-to-end train an artificial intelligence to play video games fully inside a computation graph. A team from the University of Warsaw, made up of Piotr Miล‚oล›, Bล‚aลผej Osiล„ski and Henryk Michalewski, has started a collaboration on reinforcement learning research with ลukasz Kaiser from the Google Brain team and with researchers from deepsense.ai. This project is connected to a research program on RL that deepsense.ai In the experiment, an artificial intelligence will be end-to-end trained to play video games fully inside a computation graph.


Path to Stochastic Stability: Comparative Analysis of Stochastic Learning Dynamics in Games

arXiv.org Machine Learning

Stochastic stability is a popular solution concept for stochastic learning dynamics in games. However, a critical limitation of this solution concept is its inability to distinguish between different learning rules that lead to the same steady-state behavior. We address this limitation for the first time and develop a framework for the comparative analysis of stochastic learning dynamics with different update rules but same steady-state behavior. We present the framework in the context of two learning dynamics: Log-Linear Learning (LLL) and Metropolis Learning (ML). Although both of these dynamics have the same stochastically stable states, LLL and ML correspond to different behavioral models for decision making. Moreover, we demonstrate through an example setup of sensor coverage game that for each of these dynamics, the paths to stochastically stable states exhibit distinctive behaviors. Therefore, we propose multiple criteria to analyze and quantify the differences in the short and medium run behavior of stochastic learning dynamics. We derive and compare upper bounds on the expected hitting time to the set of Nash equilibria for both LLL and ML. For the medium to long-run behavior, we identify a set of tools from the theory of perturbed Markov chains that result in a hierarchical decomposition of the state space into collections of states called cycles. We compare LLL and ML based on the proposed criteria and develop invaluable insights into the comparative behavior of the two dynamics.


Hierarchical Modular Reinforcement Learning Method and Knowledge Acquisition of State-Action Rule for Multi-target Problem

arXiv.org Machine Learning

Hierarchical Modular Reinforcement Learning (HMRL), consists of 2 layered learning where Profit Sharing works to plan a prey position in the higher layer and Q-learning method trains the state-actions to the target in the lower layer. In this paper, we expanded HMRL to multi-target problem to take the distance between targets to the consideration. The function, called `AT field', can estimate the interests for an agent according to the distance between 2 agents and the advantage/disadvantage of the other agent. Moreover, the knowledge related to state-action rules is extracted by C4.5. The action under the situation is decided by using the acquired knowledge. To verify the effectiveness of proposed method, some experimental results are reported.