AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Exploiting Hierarchy for Learning and Transfer in KL-regularized RL

Tirumala, Dhruva, Noh, Hyeonwoo, Galashov, Alexandre, Hasenclever, Leonard, Ahuja, Arun, Wayne, Greg, Pascanu, Razvan, Teh, Yee Whye, Heess, Nicolas

arXiv.org Machine LearningMar-18-2019

As reinforcement learning agents are tasked with solving more challenging and diverse tasks, the ability to incorporate prior knowledge into the learning system and to exploit reusable structure in solution space is likely to become increasingly important. The KL-regularized expected reward objective constitutes one possible tool to this end. It introduces an additional component, a default or prior behavior, which can be learned alongside the policy and as such partially transforms the reinforcement learning problem into one of behavior modelling. In this work we consider the implications of this framework in cases where both the policy and default behavior are augmented with latent variables. We discuss how the resulting hierarchical structures can be used to implement different inductive biases and how their modularity can benefit transfer. Empirically we find that they can lead to faster learning and transfer on a range of continuous control tasks.

default policy, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1903.07438

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)
Asia > Middle East > Jordan (0.04)
Asia > India > West Bengal > Kolkata (0.04)

Genre: Research Report (0.82)

Industry: Education > Focused Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Scheduled Intrinsic Drive: A Hierarchical Take on Intrinsically Motivated Exploration

Zhang, Jingwei, Wetzel, Niklas, Dorka, Nicolai, Boedecker, Joschka, Burgard, Wolfram

arXiv.org Artificial IntelligenceMar-18-2019

Exploration in sparse reward reinforcement learning remains a difficult open challenge. Many state-of-the-art methods use intrinsic motivation to complement the sparse extrinsic reward signal, giving the agent more opportunities to receive feedback during exploration. Most commonly, these signals are added as bonus rewards, which results in the mixture policy faithfully conducting neither exploration nor task fulfillment for an extended amount of time. In this paper, we instead learn separate intrinsic and extrinsic task policies and schedule between these different drives to accelerate exploration and stabilize learning. Moreover, we introduce a new type of intrinsic reward denoted as successor feature control (SFC), which is general and not task-specific. It takes into account statistics over complete trajectories and thus differs from previous methods that only use local information to evaluate intrinsic motivation. We evaluate our proposed scheduled intrinsic drive (SID) agent using three different environments with pure visual inputs: VizDoom, DeepMind Lab and OpenAI Gym classic control from pixels. The results show a greatly improved exploration efficiency with SFC and the hierarchical usage of the intrinsic drives. A video of our experimental results can be found at https://youtu.be/4ZHcBo7006Y.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1903.074

Country: Europe > Germany > Baden-Württemberg > Freiburg (0.04)

Genre:

Research Report > Promising Solution (0.34)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo

Lopez, Nestor Gonzalez, Nuin, Yue Leire Erro, Moral, Elias Barba, Juan, Lander Usategui San, Rueda, Alejandro Solano, Vilches, Víctor Mayoral, Kojcev, Risto

arXiv.org Artificial IntelligenceMar-18-2019

This paper presents an upgraded, real world application oriented version of gym-gazebo, the Robot Operating System (ROS) and Gazebo based Reinforcement Learning (RL) toolkit, which complies with OpenAI Gym. The content discusses the new ROS 2 based software architecture and summarizes the results obtained using Proximal Policy Optimization (PPO). Ultimately, the output of this work presents a benchmarking system for robotics that allows different techniques and algorithms to be compared using the same virtual conditions. We have evaluated environments with different levels of complexity of the Modular Articulated Robotic Arm (MARA), reaching accuracies in the millimeter scale. The converged results show the feasibility and usefulness of the gym-gazebo 2 toolkit, its potential and applicability in industrial use cases, using modular robots.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1903.06278

Country:

North America > Mexico > Gulf of Mexico (0.04)
Asia > Japan > Honshū > Kansai > Hyogo Prefecture > Kobe (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

Deep Reinforcement Learning with Decorrelation

Mavrin, Borislav, Yao, Hengshuai, Kong, Linglong

arXiv.org Artificial IntelligenceMar-18-2019

Learning an effective representation for high-dimensional data is a challenging problem in reinforcement learning (RL). Deep reinforcement learning (DRL) such as Deep Q networks (DQN) achieves remarkable success in computer games by learning deeply encoded representation from convolution networks. In this paper, we propose a simple yet very effective method for representation learning with DRL algorithms. Our key insight is that features learned by DRL algorithms are highly correlated, which interferes with learning. By adding a regularized loss that penalizes correlation in latent features (with only slight computation), we decorrelate features represented by deep neural networks incrementally. On 49 Atari games, with the same regularization factor, our decorrelation algorithms perform $70\%$ in terms of human-normalized scores, which is $40\%$ better than DQN. In particular, ours performs better than DQN on 39 games with 4 close ties and lost only slightly on $6$ games. Empirical results also show that the decorrelation method applies to Quantile Regression DQN (QR-DQN) and significantly boosts performance. Further experiments on the losing games show that our decorelation algorithms can win over DQN and QR-DQN with a fined tuned regularization factor.

dqn, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1903.07765

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ROS2Learn: a reinforcement learning framework for ROS 2

Nuin, Yue Leire Erro, Lopez, Nestor Gonzalez, Moral, Elias Barba, Juan, Lander Usategui San, Rueda, Alejandro Solano, Vilches, Víctor Mayoral, Kojcev, Risto

arXiv.org Artificial IntelligenceMar-18-2019

We propose a novel framework for Deep Reinforcement Learning (DRL) in modular robotics to train a robot directly from joint states, using traditional robotic tools. We use an state-of-the-art implementation of the Proximal Policy Optimization, Trust Region Policy Optimization and Actor-Critic Kronecker-Factored Trust Region algorithms to learn policies in four different Modular Articulated Robotic Arm (MARA) environments. We support this process using a framework that communicates with typical tools used in robotics, such as Gazebo and Robot Operating System 2 (ROS 2). We evaluate several algorithms in modular robots with an empirical study in simulation.

algorithm, international conference, robot, (12 more...)

arXiv.org Artificial Intelligence

1903.06282

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

Q Learning - Ashwin Vaidya

#artificialintelligenceMar-17-2019, 11:15:49 GMT

Before I explain what Q Learning is, I will quickly explain the basic principle of reinforcement learning. Reinforcement learning is a category of machine learning algorithms where the systems learn on their own by interacting with the environment. The idea is that a reward is provided to the agent if the action it takes is correct. Otherwise, some penalty is assigned to discourage the action. It is similar to how we train dogs to perform tricks, give it a snack for successfully doing a roll and rebuke it for dirtying your carpet.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Pricing algorithms can learn to collude with each other to raise prices

#artificialintelligenceMar-17-2019, 04:07:36 GMT

If you shop on Amazon, an algorithm rather than a human probably set the price of the service or item you bought. Pricing algorithms have become ubiquitous in online retail as automated systems have grown increasingly affordable and easy to implement. But while companies like airlines and hotels have long used machines to set their prices, pricing systems have evolved. They have moved from rule-based programs to reinforcement-learning ones, where the logic of deciding a product's price is no longer within a human's control. If you recall, reinforcement learning is a subset of machine learning that uses penalties and rewards to incentivize an AI agent toward a specific goal.

algorithm, machine learning, reinforcement learning, (4 more...)

#artificialintelligence

Country: Europe > Italy (0.20)

Industry: Leisure & Entertainment > Games (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Add feedback

A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

Suttle, Wesley, Yang, Zhuoran, Zhang, Kaiqing, Wang, Zhaoran, Basar, Tamer, Liu, Ji

arXiv.org Machine LearningMar-17-2019

In this work we develop a new off-policy actor-critic algorithm that performs policy improvement with convergence guarantees in the multi-agent setting using function approximation. To achieve this, we extend the method of emphatic temporal differences (ETD(λ)) to the multi-agent setting with provable convergence under linear function approximation, and we also derive a novel off-policy policy gradient theorem for the multi-agent setting. Using these new results, we develop our two-timescale algorithm, which uses ETD(λ) to perform policy evaluation for the critic step at a faster timescale and policy gradient ascent using emphatic weightings for the actor step at a slower timescale. We also provide convergence guarantees for the actor step. Our work builds on recent advances in three main areas: multi-agent on-policy actor-critic methods, emphatic temporal difference learning for off-policy policy evaluation, and the use of emphatic weightings in off-policy policy gradient methods.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1903.06372

Country: North America > United States (0.28)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Artificial Intelligence Conference NYC

#artificialintelligenceMar-16-2019, 02:31:29 GMT

The AI Conference delivers an unsurpassed depth and breadth in technical content--with a laser-sharp focus on the most important AI developments for business. From apps and reinforcement learning to conversational interfaces and executive briefings, learn how to implement AI in real-world projects using machine learning, NLP, Tensorflow, and more. Delve into the latest research and explore what the future holds for applied artificial intelligence engineering.

artificial intelligence conference nyc, machine learning, reinforcement learning

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.38)

Add feedback

RL for Real Life ICML 2019 Workshop

#artificialintelligenceMar-16-2019, 00:36:31 GMT

Reinforcement learning (RL) is a general learning, predicting, and decision making paradigm. RL provides solution methods for sequential decision making problems as well as those can be transformed into sequential ones. RL connects deeply with optimization, statistics, game theory, causal inference, sequential experimentation, etc., overlaps largely with approximate dynamic programming and optimal control, and applies broadly in science, engineering and arts. RL has been making steady progress in academia recently, e.g., Atari games, AlphaGo, visuomotor policies for robots. RL has also been applied to real world scenarios like recommender systems and neural architecture search. See a recent collection about RL applications.

machine learning, real life icml 2019, reinforcement learning, (6 more...)

#artificialintelligence

Genre: Instructional Material (0.37)

Industry: Leisure & Entertainment > Games > Computer Games (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)

Add feedback