AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

DeepMind says it's given AI an imagination. Let's take a closer look at that

#artificialintelligenceJul-22-2017, 15:45:28 GMT

Google's AI boutique, DeepMind, known for dispelling human delusions of intellectual superiority by soundly beating the world's top Go players with computer code, has found that instilling its software agents with something like imagination helps them learn better. In two papers published this week – "Imagination-Augmented Agents for Deep Reinforcement Learning" and "Learning model-based planning from scratch" – the AI biz's brain boffins, based in Britain, describe novel techniques for improving deep reinforcement learning through what can generously be described as imaginative planning. Reinforcement learning is a form of machine learning. It involves a software agent that learns by interacting with a specific environment, usually through trial and error. Deep learning is a form of machine that involves algorithms inspired by the human brain, called neural networks.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

#artificialintelligence

Country:

Europe > United Kingdom (0.26)
Asia > Japan (0.06)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games (0.74)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

A Distributional Perspective on Reinforcement Learning

Bellemare, Marc G., Dabney, Will, Munos, Rémi

arXiv.org Machine LearningJul-21-2017

In this paper we argue for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent. This is in contrast to the common approach to reinforcement learning which models the expectation of this return, or value. Although there is an established body of literature studying the value distribution, thus far it has always been used for a specific purpose such as implementing risk-aware behaviour. We begin with theoretical results in both the policy evaluation and control settings, exposing a significant distributional instability in the latter. We then use the distributional perspective to design a new algorithm which applies Bellman's equation to the learning of approximate value distributions. We evaluate our algorithm using the suite of games from the Arcade Learning Environment. We obtain both state-of-the-art results and anecdotal evidence demonstrating the importance of the value distribution in approximate reinforcement learning. Finally, we combine theoretical and empirical evidence to highlight the ways in which the value distribution impacts learning in the approximate setting.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1707.06887

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Virtual-to-real Deep Reinforcement Learning: Continuous Control of Mobile Robots for Mapless Navigation

Tai, Lei, Paolo, Giuseppe, Liu, Ming

arXiv.org Artificial IntelligenceJul-21-2017

We present a learning-based mapless motion planner by taking the sparse 10-dimensional range findings and the target position with respect to the mobile robot coordinate frame as input and the continuous steering commands as output. Traditional motion planners for mobile ground robots with a laser range sensor mostly depend on the obstacle map of the navigation environment where both the highly precise laser sensor and the obstacle map building work of the environment are indispensable. We show that, through an asynchronous deep reinforcement learning method, a mapless motion planner can be trained end-to-end without any manually designed features and prior demonstrations. The trained planner can be directly applied in unseen virtual and real environments. The experiments show that the proposed mapless motion planner can navigate the nonholonomic mobile robot to the desired targets without colliding with any obstacles.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1703.0042

Country: Asia > China (0.28)

Genre: Research Report (0.82)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Reinforcement Learning with Deep Energy-Based Policies

Haarnoja, Tuomas, Tang, Haoran, Abbeel, Pieter, Levine, Sergey

arXiv.org Artificial IntelligenceJul-21-2017

We propose a method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before. We apply our method to learning maximum entropy policies, resulting into a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution. We use the recently proposed amortized Stein variational gradient descent to learn a stochastic sampling network that approximates samples from this distribution. The benefits of the proposed algorithm include improved exploration and compositionality that allows transferring skills between tasks, which we confirm in simulated experiments with swimming and walking robots. We also draw a connection to actor-critic methods, which can be viewed performing approximate inference on the corresponding energy-based model.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

1702.08165

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Taking Machine Learning to the Next Level

#artificialintelligenceJul-19-2017, 17:38:19 GMT

Ethics are an Issue Don't kid yourself--introducing self-learning robots that can learn faster and better than humans will come with a huge range of issues. On our end, we can only program them to the extent of our human knowledge, which is always going to be limited. If we forget to set system safeties, we could have serious trouble on our hands in terms of public safety. On the other end, the question remains: do we really want to create a world of computers that think--and do--via their own free will, especially when they are smarter than humans? That's definitely an issue we need to reflect on before jumping too far into the reinforcement learning landscape.

artificial intelligence, machine learning, reinforcement learning, (1 more...)

#artificialintelligence

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

Add feedback

Reward-Balancing for Statistical Spoken Dialogue Systems using Multi-objective Reinforcement Learning

Ultes, Stefan, Budzianowski, Paweł, Casanueva, Iñigo, Mrkšić, Nikola, Rojas-Barahona, Lina, Su, Pei-Hao, Wen, Tsung-Hsien, Gašić, Milica, Young, Steve

arXiv.org Machine LearningJul-19-2017

Reinforcement learning is widely used for dialogue policy optimization where the reward function often consists of more than one component, e.g., the dialogue success and the dialogue length. In this work, we propose a structured method for finding a good balance between these components by searching for the optimal reward component weighting. To render this search feasible, we use multi-objective reinforcement learning to significantly reduce the number of training dialogues required. We apply our proposed method to find optimized component weights for six domains and compare them to a default baseline.

machine learning, natural language, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1707.06299

Country:

North America > United States (0.28)
Europe > United Kingdom (0.28)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning model-based planning from scratch

Pascanu, Razvan, Li, Yujia, Vinyals, Oriol, Heess, Nicolas, Buesing, Lars, Racanière, Sebastien, Reichert, David, Weber, Théophane, Wierstra, Daan, Battaglia, Peter

arXiv.org Machine LearningJul-19-2017

Conventional wisdom holds that model-based planning is a powerful approach to sequential decision-making. It is often very challenging in practice, however, because while a model can be used to evaluate a plan, it does not prescribe how to construct a plan. Here we introduce the "Imagination-based Planner", the first model-based, sequential decision-making agent that can learn to construct, evaluate, and execute plans. Before any action, it can perform a variable number of imagination steps, which involve proposing an imagined action and evaluating it with its model-based imagination. All imagined actions and outcomes are aggregated, iteratively, into a "plan context" which conditions future real and imagined actions. The agent can even decide how to imagine: testing out alternative imagined actions, chaining sequences of actions together, or building a more complex "imagination tree" by navigating flexibly among the previously imagined states using a learned policy. And our agent can learn to plan economically, jointly optimizing for external rewards and computational costs associated with using its imagination. We show that our architecture can learn to solve a challenging continuous control problem, and also learn elaborate planning strategies in a discrete maze-solving task. Our work opens a new direction toward learning the components of a model-based planning system and how to use them.

artificial intelligence, machine learning, reinforcement learning, (21 more...)

arXiv.org Machine Learning

1707.0617

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.66)

Add feedback

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Finn, Chelsea, Abbeel, Pieter, Levine, Sergey

arXiv.org Artificial IntelligenceJul-18-2017

We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on two few-shot image classification benchmarks, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1703.034

Country: North America (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Freeway Merging in Congested Traffic based on Multipolicy Decision Making with Passive Actor Critic

Nishi, Tomoki, Doshi, Prashant, Prokhorov, Danil

arXiv.org Artificial IntelligenceJul-14-2017

Freeway merging in congested traffic is a significant challenge toward fully automated driving. Merging vehicles need to decide not only how to merge into a spot, but also where to merge. We present a method for the freeway merging based on multi-policy decision making with a reinforcement learning method called {\em passive actor-critic} (pAC), which learns with less knowledge of the system and without active exploration. The method selects a merging spot candidate by using the state value learned with pAC. We evaluate our method using real traffic data. Our experiments show that pAC achieves 92\% success rate to merge into a freeway, which is comparable to human decision making.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TIV.2019.2904417

1707.04489

Country: North America > United States (1.00)

Genre: Research Report (0.40)

Industry:

Transportation > Ground > Road (1.00)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Distral: Robust Multitask Reinforcement Learning

Teh, Yee Whye, Bapst, Victor, Czarnecki, Wojciech Marian, Quan, John, Kirkpatrick, James, Hadsell, Raia, Heess, Nicolas, Pascanu, Razvan

arXiv.org Machine LearningJul-13-2017

Most deep reinforcement learning algorithms are data inefficient in complex and rich environments, limiting their applicability to many scenarios. One direction for improving data efficiency is multitask learning with shared neural network parameters, where efficiency may be improved through transfer across related tasks. In practice, however, this is not usually observed, because gradients from different tasks can interfere negatively, making learning unstable and sometimes even less data efficient. Another issue is the different reward schemes between tasks, which can easily lead to one task dominating the learning of a shared model. We propose a new approach for joint training of multiple tasks, which we refer to as Distral (Distill & transfer learning). Instead of sharing parameters between the different workers, we propose to share a "distilled" policy that captures common behaviour across tasks. Each worker is trained to solve its own task while constrained to stay close to the shared policy, while the shared policy is trained by distillation to be the centroid of all task policies. Both aspects of the learning process are derived by optimizing a joint objective function. We show that our approach supports efficient transfer on complex 3D environments, outperforming several related methods. Moreover, the proposed learning process is more robust and more stable---attributes that are critical in deep reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1707.04175

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback