AITopics

Exploration strategy design is one of the challenging problems in reinforcement learning~(RL), especially when the environment contains a large state space or sparse rewards. During exploration, the agent tries to discover novel areas or high reward~(quality) areas. In most existing methods, the novelty and quality in the neighboring area of the current state are not well utilized to guide the exploration of the agent. To tackle this problem, we propose a novel RL framework, called \underline{c}lustered \underline{r}einforcement \underline{l}earning~(CRL), for efficient exploration in RL. CRL adopts clustering to divide the collected states into several clusters, based on which a bonus reward reflecting both novelty and quality in the neighboring area~(cluster) of the current state is given to the agent. Experiments on a continuous control task and several \emph{Atari 2600} games show that CRL can outperform other state-of-the-art methods to achieve the best performance in most cases.

exploration, neural network, upstream oil & gas, (19 more...)

1906.02457

Country: Asia (0.28)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Gelada, Carles, Kumar, Saurabh, Buckman, Jacob, Nachum, Ofir, Bellemare, Marc G.

DeepMDP: Learning Continuous Latent Space Models for Representation Learning

arXiv.org Machine LearningJun-6-2019

Many reinforcement learning (RL) tasks provide the agent with high-dimensional observations that can be simplified into low-dimensional continuous states. To formalize this process, we introduce the concept of a DeepMDP, a parameterized latent space model that is trained via the minimization of two tractable losses: prediction of rewards and prediction of the distribution over next latent states. We show that the optimization of these objectives guarantees (1) the quality of the latent space as a representation of the state space and (2) the quality of the DeepMDP as a model of the environment. We connect these results to prior work in the bisimulation literature, and explore the use of a variety of metrics. Our theoretical findings are substantiated by the experimental result that a trained DeepMDP recovers the latent structure underlying high-dimensional observations on a synthetic environment. Finally, we show that learning a DeepMDP as an auxiliary task in the Atari 2600 domain leads to large performance improvements over model-free RL.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1906.02736

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Ward, Patrick Nadeem, Smofsky, Ariella, Bose, Avishek Joey

Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies

Deep Reinforcement Learning (DRL) algorithms for continuous action spaces are known to be brittle toward hyperparameters as well as \cut{being}sample inefficient. Soft Actor Critic (SAC) proposes an off-policy deep actor critic algorithm within the maximum entropy RL framework which offers greater stability and empirical gains. The choice of policy distribution, a factored Gaussian, is motivated by \cut{chosen due}its easy re-parametrization rather than its modeling power. We introduce Normalizing Flow policies within the SAC framework that learn more expressive classes of policies than simple factored Gaussians. \cut{We also present a series of stabilization tricks that enable effective training of these policies in the RL setting.}We show empirically on continuous grid world tasks that our approach increases stability and is better suited to difficult exploration in sparse reward settings.

artificial intelligence, machine learning, reinforcement learning, (9 more...)

1906.02771

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)

Worst-Case Regret Bounds for Exploration via Randomized Value Functions

Russo, Daniel

Exploration is one of the central challenges in reinforcement learning (RL). A large theoretical literature treats exploration in simple finite state and action MDPs, showing that it is possible to efficiently learn a near optimal policy through interaction alone [5, 8, 10, 11, 13-16, 24, 25]. Overwhelmingly, this literature focuses on optimistic algorithms, with most algorithms explicitly maintaining uncertainty sets that are likely to contain the true MDP. It has been difficult to adapt these exploration algorithms to the more complex problems investigated in the applied RL literature. Most applied papers seem to generate exploration through ǫ-greedy or Boltzmann exploration. Those simple methods are compatible with practical value function learning algorithms, which use parametric approximations to value functions to generalize across high dimensional state spaces. Unfortunately, such exploration algorithms can fail catastrophically in simple finite state MDPs [See e.g.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1906.0287

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)

Mott, Alex, Zoran, Daniel, Chrzanowski, Mike, Wierstra, Daan, Rezende, Danilo J.

Towards Interpretable Reinforcement Learning Using Attention Augmented Agents

arXiv.org Machine LearningJun-6-2019

Inspired by recent work in attention models for image captioning and question answering, we present a soft attention model for the reinforcement learning domain. This model uses a soft, top-down attention mechanism to create a bottleneck in the agent, forcing it to focus on task-relevant information by sequentially querying its view of the environment. The output of the attention mechanism allows direct observation of the information used by the agent to select its actions, enabling easier interpretation of this model than of traditional models. We analyze different strategies that the agents learn and show that a handful of strategies arise repeatedly across different games. We also show that the model learns to query separately about space and content ("where" vs. "what"). We demonstrate that an agent using this mechanism can achieve performance competitive with state-of-the-art models on ATARI tasks while still being interpretable.

machine learning, natural language, reinforcement learning, (19 more...)

arXiv.org Machine Learning

1906.025

Country: Europe > United Kingdom > England > Greater London > London (0.05)

Genre: Research Report (0.70)

Industry:

Leisure & Entertainment > Sports (0.68)
Leisure & Entertainment > Games > Computer Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

arXiv.org Machine LearningJun-6-2019

Classical Policy Gradient: Preserving Bellman's Principle of Optimality

Thomas, Philip S., Jordan, Scott M., Chandak, Yash, Nota, Chris, Kostas, James

We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Machine Learning

1906.03063

Country:

North America > United States > Massachusetts (0.29)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Vezhnevets, Alexander Sasha, Wu, Yuhuai, Leblond, Remi, Leibo, Joel Z.

Options as responses: Grounding behavioural hierarchies in multi-agent RL

We propose a novel hierarchical agent architecture for multi-agent reinforcement learning with concealed information. The hierarchy is grounded in the concealed information about other players, which resolves "the chicken or the egg" nature of option discovery. We factorise the value function over a latent representation of the concealed information and then re-use this latent space to factorise the policy into options. Low-level policies (options) are trained to respond to particular states of other agents grouped by the latent representation, while the top level (meta-policy) learns to infer the latent representation from its own observation thereby to select the right option. This grounding facilitates credit assignment across the levels of hierarchy. We show that this helps generalisation---performance against a held-out set of pre-trained competitors, while training in self- or population-play---and resolution of social dilemmas in self-play.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1906.0147

Country: North America (0.28)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

#artificialintelligenceJun-5-2019, 18:48:52 GMT

Generation of ice states through deep reinforcement learning

We present a deep reinforcement learning framework where a machine agent is trained to search for a policy to generate a ground state for the square ice model by exploring the physical environment. After training, the agent is capable of proposing a sequence of local moves to achieve the goal. Analysis of the trained policy and the state value function indicates that the ice rule and loop-closing condition are learned without prior knowledge. We test the trained policy as a sampler in the Markov chain Monte Carlo and benchmark against the baseline loop algorithm. This framework can be generalized to other models with topological constraints where generation of constraint-preserving states is difficult.

artificial intelligence, deep reinforcement, reinforcement learning, (3 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceJun-5-2019, 18:47:47 GMT

Danny Lange on LinkedIn: "Yay! Our very own Jeffrey Shih - product manager of ML-Agents - speaking at the RE•WORK AI Summit. What can be more exciting than Deep Reinforcement Learning in Gaming? #ai #unity3d #reinforcementlearning "

Excited to speak at the RE•WORK AI Summit in SF on June 20. My talk is on #DeepLearning and the Gaming Industry. How video games drive advancements in #AI #research and how these advancements are used in #games using Unity Technologies #mlagents cc Nikita Johnson https://lnkd.in/g5RVq82

artificial intelligence, deep reinforcement learning, machine learning, (9 more...)

#artificialintelligence

Industry: Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceJun-5-2019, 04:13:50 GMT

Advanced AI: Deep Reinforcement Learning in Python

What you will learn in this course? In this course, you'll work with more complex environments, specifically provided by the OpenAI Gym: CartPole Mountain Car Atari games to train effective learning agents so you'll need new techniques. We've seen that reinforcement learning is an entirely different kind of machine learning than supervised and unsupervised learning.Supervised and unsupervised machine learning algorithms are for making predictions about data and analyzing, while reinforcement learning is about training an agent to interact with an environment and maximize its reward. Deep reinforcement learning and AI has a lot of potentials also carries huge risk. One main principle of training reinforcement learning agents is that there are unintended consequences when training an AI.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.55)

Industry: Leisure & Entertainment > Games > Computer Games (0.59)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.32)