AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization

Chu, Xiangxiang

arXiv.org Artificial IntelligenceJul-1-2018

This paper proposes a first order gradient reinforcement learning algorithm, which can be seen as a variant for Trust Region Policy Optimization(TRPO). This method, which we call policy optimization with penalized point probability distance (POP3D), keeps almost all positive spheres of proximal policy optimization (PPO) such as easy implementation, fast learning and high score capability. As PPO, we also use a single surrogate objective without constraints, where a penalized item based on point probability distance is included to prevent update step from growing too large. Experiments verify that POP3D is state-of-the-art within 40 million frame steps on 49 Atari games based on two common metrics, which can be a competitive alternative to PPO. Moreover, comparison experiments regarding PPO based on Mujoco environment verify that POP3D is also competitive in continuous domain. In addition, we release the code on github https://github.com/cxxgtxy/POP3D.git.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

1807.00442

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.75)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Towards Mixed Optimization for Reinforcement Learning with Program Synthesis

Bhupatiraju, Surya, Agrawal, Kumar Krishna, Singh, Rishabh

arXiv.org Artificial IntelligenceJul-1-2018

Deep reinforcement learning has led to several recent breakthroughs, though the learned policies are often based on black-box neural networks. This makes them difficult to interpret and to impose desired specification constraints during learning. We present an iterative framework, MORL, for improving the learned policies using program synthesis. Concretely, we propose to use synthesis techniques to obtain a symbolic representation of the learned policy, which can then be debugged manually or automatically using program repair. After the repair step, we use behavior cloning to obtain the policy corresponding to the repaired program, which is then further improved using gradient descent. This process continues until the learned policy satisfies desired constraints. We instantiate MORL for the simple CartPole problem and show that the programmatic representation allows for high-level modifications that in turn lead to improved learning of the policies.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1807.00403

Country:

North America > United States (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Amanuensis: The Programmer's Apprentice

Dean, Thomas, Chiang, Maurice, Gomez, Marcus, Gruver, Nate, Hindy, Yousef, Lam, Michelle, Lu, Peter, Sanchez, Sophia, Saxena, Rohun, Smith, Michael, Wang, Lucy, Wong, Catherine

arXiv.org Artificial IntelligenceJun-29-2018

Suppose you could merely imagine a computation, and a digital prostheses, an extension of your biological brain, would turn it into code that instantly realizes what you had in mind. Imagine looking at an image, dataset or set of equations and wanting to analyze and explore its meaning as an artistic whim or part of a scientific investigation. I don't mean you would use an existing software suite to produce a standard visualization, but rather you would make use of an extensive repository of existing code to assemble a new program analogous to how a composer draws upon a repertoire of musical motifs, themes and styles to construct new works, and tantamount to having a talented musical amanuensis who, in addition to copying your scores, takes liberties with your prior work, making small alterations here and there and occasionally adding new works of its own invention, novel but consistent with your taste and sensibilities. Perhaps the interaction would be wordless and you would express your objective by simply focusing your attention and guiding your imagination, the prostheses operating directly on patterns of activation arising in your primary sensory, proprioceptive and associative cortex that have become part of an extensive vocabulary that you now share with your personal digital amanuensis. Or perhaps it would involve a conversation conducted in subvocal, unarticulated speech in which you specify what it is you want to compute and your assistant asks questions to clarify your intention and the two of you share examples of input and output to ground your internal conversation in concrete terms. More than thirty years ago, Charles Rich and Richard Waters published an MIT AI Lab technical report [68] entitled The Programmer's Apprentice: A Research Overview. Whether they intended it or not, it would have been easy in those days for someone to misremember the title and inadvertently refer to it as "The Sorcerer's Apprentice" since computer programmers at the time were often characterized as wizards and most children were familiar with the Walt Disney movie Fantasia, featuring music written by Paul Dukas inspired by Goethe's poem of the same name

machine learning, programming language, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

1807.00082

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(4 more...)

Genre:

Instructional Material > Course Syllabus & Notes (0.67)
Research Report (0.63)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Education (1.00)
(2 more...)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(7 more...)

Add feedback

TextWorld: A Learning Environment for Text-based Games

Côté, Marc-Alexandre, Kádár, Ákos, Yuan, Xingdi, Kybartas, Ben, Barnes, Tavian, Fine, Emery, Moore, James, Hausknecht, Matthew, Asri, Layla El, Adada, Mahmoud, Tay, Wendy, Trischler, Adam

arXiv.org Machine LearningJun-29-2018

We introduce TextWorld, a sandbox learning environment for the training and evaluation of RL agents on text-based games. TextWorld is a Python library that handles interactive play-through of text games, as well as backend functions like state tracking and reward assignment. It comes with a curated list of games whose features and challenges we have analyzed. More significantly, it enables users to handcraft or automatically generate new games. Its generative mechanisms give precise control over the difficulty, scope, and language of constructed games, and can be used to relax challenges inherent to commercial text games like partial observability and sparse rewards. By generating sets of varied but similar games, TextWorld can also be used to study generalization and transfer learning. We cast text-based games in the Reinforcement Learning formalism, use our framework to develop a set of benchmark games, and evaluate several baseline agents on this set and the curated list.

machine learning, natural language, reinforcement learning, (20 more...)

arXiv.org Machine Learning

1806.11532

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England (0.04)
Asia > Malaysia (0.04)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Add feedback

AI for Prosthetics Week 1: Understanding the Challenge

#artificialintelligenceJun-28-2018, 02:42:48 GMT

The AI for Prosthetics challenge is one of NIPS 2018 Competition tracks. In this challenge, the participants seek to build an agent that can make a 3D model of human with prosthetics run. This challenge is a continuation of the Learning to Run challenge (shown below) that was part of NIPS 2017 Competition Track. To start the challenge, you first need to install few packages with Anaconda. Here is a detailed description of the installation process.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

Add feedback

Learning Multi-Step Robotic Tasks from Observation

Goo, Wonjoon, Niekum, Scott

arXiv.org Machine LearningJun-28-2018

Due to burdensome data requirements, learning from demonstration often falls short of its promise to allow users to quickly and naturally program robots. Demonstrations are inherently ambiguous and incomplete, making a correct generalization to unseen situations difficult without a large number of demonstrations in varying conditions. By contrast, humans are often able to learn complex tasks from a single demonstration (typically observations without action labels) by leveraging context learned over a lifetime. Inspired by this capability, we aim to enable robots to perform one-shot learning of multi-step tasks from observation by leveraging auxiliary video data as context. Our primary contribution is a novel action localization algorithm that identifies clips of activities in auxiliary videos that match the activities in a user-segmented demonstration, providing additional examples of each. While this auxiliary video data could be used in multiple ways for learning, we focus on an inverse reinforcement learning setting. We empirically show that across several tasks, robots can learn multi-step tasks more effectively from videos with localized actions, compared to unsegmented videos.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

1806.11244

Country: North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Espeholt, Lasse, Soyer, Hubert, Munos, Remi, Simonyan, Karen, Mnih, Volodymir, Ward, Tom, Doron, Yotam, Firoiu, Vlad, Harley, Tim, Dunning, Iain, Legg, Shane, Kavukcuoglu, Koray

arXiv.org Artificial IntelligenceJun-28-2018

In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in singlemachine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment (Beattie et al., 2016)) and Atari-57 (all available Atari games in Arcade Learning Environment (Bellemare et al., 2013a)). Our results show that IMPALA is able to achieve better performance than previous agents with less data, and crucially exhibits positive transfer between tasks as a result of its multi-task approach. The source code is publicly available at github.com/deepmind/scalable

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1802.01561

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.54)

Industry:

Education (0.88)
Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Context-Aware Policy Reuse

Li, Siyuan, Gu, Fangda, Zhu, Guangxiang, Zhang, Chongjie

arXiv.org Artificial IntelligenceJun-28-2018

Transfer learning can greatly speed up reinforcement learning for a new task by leveraging policies of relevant tasks. Existing works of policy reuse either focus on only selecting a single best source policy for transfer without considering contexts, or cannot guarantee to learn an optimal policy for a target task. To improve transfer efficiency and guarantee optimality, we develop a novel policy reuse method, called Context-Aware Policy reuSe (CAPS), that enables multi-policy transfer. Our method learns when and which source policy is best for reuse, as well as when to terminate its reuse. CAPS provides theoretical guarantees in convergence and optimality for both source policy selection and target task learning. Empirical results on a grid-based navigation domain and the Pygame Learning Environment demonstrate that CAPS significantly outperforms other state-of-the-art policy reuse methods.

machine learning, reinforcement learning, source policy, (16 more...)

arXiv.org Artificial Intelligence

1806.03793

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Education (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Hierarchical Reinforcement Learning with Abductive Planning

Yamamoto, Kazeto, Onishi, Takashi, Tsuruoka, Yoshimasa

arXiv.org Artificial IntelligenceJun-28-2018

One of the key challenges in applying reinforcement learning to real-life problems is that the amount of train-and-error required to learn a good policy increases drastically as the task becomes complex. One potential solution to this problem is to combine reinforcement learning with automated symbol planning and utilize prior knowledge on the domain. However, existing methods have limitations in their applicability and expressiveness. In this paper we propose a hierarchical reinforcement learning method based on abductive symbolic planning. The planner can deal with user-defined evaluation functions and is not based on the Herbrand theorem. Therefore it can utilize prior knowledge of the rewards and can work in a domain where the state space is unknown. We demonstrate empirically that our architecture significantly improves learning efficiency with respect to the amount of training examples on the evaluation domain, in which the state space is unknown and there exist multiple goals.

abduction, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1806.10792

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States (0.04)
Europe > Spain (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)

Add feedback

Explaining Reinforcement Learning: Active vs Passive

#artificialintelligenceJun-27-2018, 11:22:08 GMT

This post assumes that you are familiar with the basics of Reinforcement Learning(RL) and Markov Decision Processes, if not please refer to this previous post first. Let's consider a problem where the agent can be in various states and can choose an action from a set of actions. Such type of problems are called Sequential Decision Problems. The solution to an MDP is an optimal policy which refers to the choice of action for every state that maximizes overall cumulative reward. Thus, the transition model that represents an agent's environment(when the environment is known) and the optimal policy which decides what action the agent needs to perform in each state are required elements for training the agent learn a specific behavior.

artificial intelligence, machine learning, reinforcement learning, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback