Goto

Collaborating Authors

 Reinforcement Learning


Clustered Reinforcement Learning

arXiv.org Artificial Intelligence

Exploration strategy design is one of the challenging problems in reinforcement learning~(RL), especially when the environment contains a large state space or sparse rewards. During exploration, the agent tries to discover novel areas or high reward~(quality) areas. In most existing methods, the novelty and quality in the neighboring area of the current state are not well utilized to guide the exploration of the agent. To tackle this problem, we propose a novel RL framework, called \underline{c}lustered \underline{r}einforcement \underline{l}earning~(CRL), for efficient exploration in RL. CRL adopts clustering to divide the collected states into several clusters, based on which a bonus reward reflecting both novelty and quality in the neighboring area~(cluster) of the current state is given to the agent. Experiments on a continuous control task and several \emph{Atari 2600} games show that CRL can outperform other state-of-the-art methods to achieve the best performance in most cases.


DeepMDP: Learning Continuous Latent Space Models for Representation Learning

arXiv.org Machine Learning

Many reinforcement learning (RL) tasks provide the agent with high-dimensional observations that can be simplified into low-dimensional continuous states. To formalize this process, we introduce the concept of a DeepMDP, a parameterized latent space model that is trained via the minimization of two tractable losses: prediction of rewards and prediction of the distribution over next latent states. We show that the optimization of these objectives guarantees (1) the quality of the latent space as a representation of the state space and (2) the quality of the DeepMDP as a model of the environment. We connect these results to prior work in the bisimulation literature, and explore the use of a variety of metrics. Our theoretical findings are substantiated by the experimental result that a trained DeepMDP recovers the latent structure underlying high-dimensional observations on a synthetic environment. Finally, we show that learning a DeepMDP as an auxiliary task in the Atari 2600 domain leads to large performance improvements over model-free RL.


Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies

arXiv.org Artificial Intelligence

Deep Reinforcement Learning (DRL) algorithms for continuous action spaces are known to be brittle toward hyperparameters as well as \cut{being}sample inefficient. Soft Actor Critic (SAC) proposes an off-policy deep actor critic algorithm within the maximum entropy RL framework which offers greater stability and empirical gains. The choice of policy distribution, a factored Gaussian, is motivated by \cut{chosen due}its easy re-parametrization rather than its modeling power. We introduce Normalizing Flow policies within the SAC framework that learn more expressive classes of policies than simple factored Gaussians. \cut{We also present a series of stabilization tricks that enable effective training of these policies in the RL setting.}We show empirically on continuous grid world tasks that our approach increases stability and is better suited to difficult exploration in sparse reward settings.


Worst-Case Regret Bounds for Exploration via Randomized Value Functions

arXiv.org Artificial Intelligence

Exploration is one of the central challenges in reinforcement learning (RL). A large theoretical literature treats exploration in simple finite state and action MDPs, showing that it is possible to efficiently learn a near optimal policy through interaction alone [5, 8, 10, 11, 13-16, 24, 25]. Overwhelmingly, this literature focuses on optimistic algorithms, with most algorithms explicitly maintaining uncertainty sets that are likely to contain the true MDP. It has been difficult to adapt these exploration algorithms to the more complex problems investigated in the applied RL literature. Most applied papers seem to generate exploration through ǫ-greedy or Boltzmann exploration. Those simple methods are compatible with practical value function learning algorithms, which use parametric approximations to value functions to generalize across high dimensional state spaces. Unfortunately, such exploration algorithms can fail catastrophically in simple finite state MDPs [See e.g.


Towards Interpretable Reinforcement Learning Using Attention Augmented Agents

arXiv.org Machine Learning

Inspired by recent work in attention models for image captioning and question answering, we present a soft attention model for the reinforcement learning domain. This model uses a soft, top-down attention mechanism to create a bottleneck in the agent, forcing it to focus on task-relevant information by sequentially querying its view of the environment. The output of the attention mechanism allows direct observation of the information used by the agent to select its actions, enabling easier interpretation of this model than of traditional models. We analyze different strategies that the agents learn and show that a handful of strategies arise repeatedly across different games. We also show that the model learns to query separately about space and content ("where" vs. "what"). We demonstrate that an agent using this mechanism can achieve performance competitive with state-of-the-art models on ATARI tasks while still being interpretable.


Classical Policy Gradient: Preserving Bellman's Principle of Optimality

arXiv.org Machine Learning

We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.


Options as responses: Grounding behavioural hierarchies in multi-agent RL

arXiv.org Artificial Intelligence

We propose a novel hierarchical agent architecture for multi-agent reinforcement learning with concealed information. The hierarchy is grounded in the concealed information about other players, which resolves "the chicken or the egg" nature of option discovery. We factorise the value function over a latent representation of the concealed information and then re-use this latent space to factorise the policy into options. Low-level policies (options) are trained to respond to particular states of other agents grouped by the latent representation, while the top level (meta-policy) learns to infer the latent representation from its own observation thereby to select the right option. This grounding facilitates credit assignment across the levels of hierarchy. We show that this helps generalisation---performance against a held-out set of pre-trained competitors, while training in self- or population-play---and resolution of social dilemmas in self-play.


Generation of ice states through deep reinforcement learning

#artificialintelligence

We present a deep reinforcement learning framework where a machine agent is trained to search for a policy to generate a ground state for the square ice model by exploring the physical environment. After training, the agent is capable of proposing a sequence of local moves to achieve the goal. Analysis of the trained policy and the state value function indicates that the ice rule and loop-closing condition are learned without prior knowledge. We test the trained policy as a sampler in the Markov chain Monte Carlo and benchmark against the baseline loop algorithm. This framework can be generalized to other models with topological constraints where generation of constraint-preserving states is difficult.


Danny Lange on LinkedIn: "Yay! Our very own Jeffrey Shih - product manager of ML-Agents - speaking at the RE•WORK AI Summit. What can be more exciting than Deep Reinforcement Learning in Gaming? #ai #unity3d #reinforcementlearning "

#artificialintelligence

Excited to speak at the RE•WORK AI Summit in SF on June 20. My talk is on #DeepLearning and the Gaming Industry. How video games drive advancements in #AI #research and how these advancements are used in #games using Unity Technologies #mlagents cc Nikita Johnson https://lnkd.in/g5RVq82


Advanced AI: Deep Reinforcement Learning in Python

#artificialintelligence

What you will learn in this course? In this course, you'll work with more complex environments, specifically provided by the OpenAI Gym: CartPole Mountain Car Atari games to train effective learning agents so you'll need new techniques. We've seen that reinforcement learning is an entirely different kind of machine learning than supervised and unsupervised learning.Supervised and unsupervised machine learning algorithms are for making predictions about data and analyzing, while reinforcement learning is about training an agent to interact with an environment and maximize its reward. Deep reinforcement learning and AI has a lot of potentials also carries huge risk. One main principle of training reinforcement learning agents is that there are unintended consequences when training an AI.