AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup

Schwab, Devin, Springenberg, Tobias, Martins, Murilo F., Lampe, Thomas, Neunert, Michael, Abdolmaleki, Abbas, Hertweck, Tim, Hafner, Roland, Nori, Francesco, Riedmiller, Martin

arXiv.org Machine LearningFeb-18-2019

We present a method for fast training of vision based control policies on real robots. The key idea behind our method is to perform multi-task Reinforcement Learning with auxiliary tasks that differ not only in the reward to be optimized but also in the state-space in which they operate. In particular, we allow auxiliary task policies to utilize task features that are available only at training-time. This allows for fast learning of auxiliary policies, which subsequently generate good data for training the main, vision-based control policies. This method can be seen as an extension of the Scheduled Auxiliary Control (SAC-X) framework. We demonstrate the efficacy of our method by using both a simulated and real-world Ball-in-a-Cup game controlled by a robot arm. In simulation, our approach leads to significant learning speed-ups when compared to standard SAC-X. On the real robot we show that the task can be learned from-scratch, i.e., with no transfer from simulation and no imitation learning. Videos of our learned policies running on the real robot can be found at https://sites.google.com/view/rss-2019-sawyer-bic/.

experiment, learning, robot, (13 more...)

arXiv.org Machine Learning

1902.04706

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning

Kim, Woojun, Cho, Myungsik, Sung, Youngchul

arXiv.org Artificial IntelligenceFeb-18-2019

In this paper, we propose a new learning technique named message-dropout to improve the performance for multi-agent deep reinforcement learning under two application scenarios: 1) classical multi-agent reinforcement learning with direct message communication among agents and 2) centralized training with decentralized execution. In the first application scenario of multi-agent systems in which direct message communication among agents is allowed, the message-dropout technique drops out the received messages from other agents in a block-wise manner with a certain probability in the training phase and compensates for this effect by multiplying the weights of the dropped-out block units with a correction probability. The applied message-dropout technique effectively handles the increased input dimension in multi-agent reinforcement learning with communication and makes learning robust against communication errors in the execution phase. In the second application scenario of centralized training with decentralized execution, we particularly consider the application of the proposed message-dropout to Multi-Agent Deep Deterministic Policy Gradient (MADDPG), which uses a centralized critic to train a decentralized actor for each agent. We evaluate the proposed message-dropout technique for several games, and numerical results show that the proposed message-dropout technique with proper dropout rate improves the reinforcement learning performance significantly in terms of the training speed and the steady-state performance in the execution phase.

agent, architecture, communication, (17 more...)

arXiv.org Artificial Intelligence

1902.06527

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Parenting: Safe Reinforcement Learning from Human Input

Frye, Christopher, Feige, Ilya

arXiv.org Machine LearningFeb-18-2019

Autonomous agents trained via reinforcement learning present numerous safety concerns: reward hacking, negative side effects, and unsafe exploration, among others. In the context of near-future autonomous agents, operating in environments where humans understand the existing dangers, human involvement in the learning process has proved a promising approach to AI Safety. Here we demonstrate that a precise framework for learning from human input, loosely inspired by the way humans parent children, solves a broad class of safety problems in this context. We show that our PARENTING algorithm solves these problems in the relevant AI Safety gridworlds of Leike et al. (2017), that an agent can learn to outperform its parent as it "matures", and that policies learnt through PARENTING are generalisable to new environments.

agent, gridworld, parenting, (11 more...)

arXiv.org Machine Learning

1902.06766

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Estonia > Harju County > Tallinn (0.04)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A* Tree Search for Portfolio Management

Gao, Xiaojie, Tu, Shikui, Xu, Lei

arXiv.org Artificial IntelligenceFeb-18-2019

We propose a planning-based method to teach an agent to manage portfolio from scratch. Our approach combines deep reinforcement learning techniques with search techniques like AlphaGo. By uniting the advantages in A* search algorithm with Monte Carlo tree search, we come up with a new algorithm named A* tree search in which best information is returned to guide next search. Also, the expansion mode of Monte Carlo tree is improved for a higher utilization of the neural network. The suggested algorithm can also optimize non-differentiable utility function by combinatorial search. This technique is then used in our trading system. The major component is a neural network that is trained by trading experiences from tree search and outputs prior probability to guide search by pruning away branches in turn. Experimental results on simulated and real financial data verify the robustness of the proposed trading system and the trading system produces better strategies than several approaches based on reinforcement learning.

machine learning, portfolio management, reinforcement learning, (2 more...)

arXiv.org Artificial Intelligence

1901.01855

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A new Potential-Based Reward Shaping for Reinforcement Learning Agent

Badnava, Babak, Mozayani, Nasser

arXiv.org Artificial IntelligenceFeb-17-2019

Potential-based reward shaping (PBRS) is a particular category of machine learning methods which aims to improve the learning speed of a reinforcement learning agent by extracting and utilizing extra knowledge while performing a task. There are two steps in the process of transfer learning: extracting knowledge from previously learned tasks and transferring that knowledge to use it in a target task. The latter step is well discussed in the literature with various methods being proposed for it, while the former has been explored less. With this in mind, the type of knowledge that is transmitted is very important and can lead to considerable improvement. Among the literature of both the transfer learning and the potential-based reward shaping, a subject that has never been addressed is the knowledge gathered during the learning process itself. In this paper, we presented a novel potential-based reward shaping method that attempted to extract knowledge from the learning process. The proposed method extracts knowledge from episodes' cumulative rewards. The proposed method has been evaluated in the Arcade learning environment and the results indicate an improvement in the learning process in both the single-task and the multi-task reinforcement learner agents.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1902.06239

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Asia > Middle East > Iran (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

SURREAL

#artificialintelligenceFeb-16-2019, 06:58:39 GMT

Our goal is to make Deep Reinforcement Learning accessible to everyone. We introduce Surreal, an open-source, reproducible, and scalable distributed reinforcement learning framework. Surreal provides a high-level abstraction for building distributed reinforcement learning algorithms. We implement our distributed variants of PPO and DDPG in the current release. Click to see detailed documentation!

machine learning, reinforcement learning, surreal, (1 more...)

#artificialintelligence

Country: North America > United States > California > Santa Clara County > Palo Alto (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Communication Topologies Between Learning Agents in Deep Reinforcement Learning

Adjodah, Dhaval, Calacci, Dan, Dubey, Abhimanyu, Goyal, Anirudh, Krafft, Peter, Moro, Esteban, Pentland, Alex

arXiv.org Machine LearningFeb-16-2019

A common technique to improve speed and robustness of learning in deep reinforcement learning (DRL) and many other machine learning algorithms is to run multiple learning agents in parallel. A neglected component in the development of these algorithms has been how best to arrange the learning agents involved to better facilitate distributed search. Here we draw upon results from the networked optimization and collective intelligence literatures suggesting that arranging learning agents in less than fully connected topologies (the implicit way agents are commonly arranged in) can improve learning. We explore the relative performance of four popular families of graphs and observe that one such family (Erdos-Renyi random graphs) empirically outperforms the standard fully-connected communication topology across several DRL benchmark tasks. We observe that 1000 learning agents arranged in an Erdos-Renyi graph can perform as well as 3000 agents arranged in the standard fully-connected topology, showing the large learning improvement possible when carefully designing the topology over which agents communicate. We complement these empirical results with a preliminary theoretical investigation of why less than fully connected topologies can perform better. Overall, our work suggests that distributed machine learning algorithms could be made more efficient if the communication topology between learning agents was optimized.

agent, communication topology, topology, (12 more...)

arXiv.org Machine Learning

1902.0674

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Spain > Galicia > Madrid (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Competitive Experience Replay

Liu, Hao, Trott, Alexander, Socher, Richard, Xiong, Caiming

arXiv.org Machine LearningFeb-16-2019

Deep learning has achieved remarkable successes in solving challenging reinforcement learning (RL) problems when dense reward function is provided. However, in sparse reward environment it still often suffers from the need to carefully shape reward function to guide policy optimization. This limits the applicability of RL in the real world since both reinforcement learning and domain-specific knowledge are required. It is therefore of great practical importance to develop algorithms which can learn from a binary signal indicating successful task completion or other unshaped, sparse reward signals. We propose a novel method called competitive experience replay, which efficiently supplements a sparse reward by placing learning in the context of an exploration competition between a pair of agents. Our method complements the recently proposed hindsight experience replay (HER) by inducing an automatic exploratory curriculum. We evaluate our approach on the tasks of reaching various goal locations in an ant maze and manipulating objects with a robotic arm. Each task provides only binary rewards indicating whether or not the goal is achieved. Our method asymmetrically augments these sparse rewards for a pair of agents each learning the same task, creating a competitive game designed to drive exploration. Extensive experiments demonstrate that this method leads to faster converge and improved task performance.

agent, arxiv preprint arxiv, exploration, (12 more...)

arXiv.org Machine Learning

1902.00528

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Google open-sources PlaNet, an AI agent that learns about the world from images

#artificialintelligenceFeb-15-2019, 22:32:58 GMT

But it's not always practical; model-free approaches, which aim to get agents to directly predict actions from observations about their world, can take weeks of training. Model-based reinforcement learning is a viable alternative -- it has agents come up with a general model of their environment they can use to plan ahead. But in order to accurately forecast actions in unfamiliar surroundings, those agents have to formulate rules from experience. Toward that end, Google in collaboration with DeepMind today introduced the Deep Planning Network (PlaNet) agent, which learns a world model from image inputs and leverages it for planning. It's able to solve a variety of image-based tasks with up to 5,000 percent the data efficiency, Google says, while maintaining competitiveness with advanced model-free agents.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

#artificialintelligence

Industry: Information Technology > Services (0.39)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.37)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.37)

Add feedback

Mobile AI Through Machine Learning Algorithms

#artificialintelligenceFeb-15-2019, 22:31:33 GMT

Machine learning (ML) is a method of artificial intelligence (AI) in which data is used to train a machine so that it can make decisions or predictions on its own. In a previous blog, Setting up your machine learning projects for success, we discussed how data and modelling play a key role in allowing a machine to learn and improve, and our e-book explains how ML fits into the bigger picture of AI. An ML algorithm is a key element that ties this all together, and as we'll discover in this blog, there are four main categories of ML algorithms – supervised machine learning, unsupervised machine learning, semi-supervised machine learning, and reinforcement machine learning. Many of today's ML algorithms can be considered supervised, which means the model is iteratively trained by running the algorithm and comparing its output against data that is known to be correct. Once training is complete, the algorithm and model are ready for inference.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.33)

Add feedback