AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood

Guu, Kelvin, Pasupat, Panupong, Liu, Evan Zheran, Liang, Percy

arXiv.org Machine LearningApr-25-2017

Our goal is to learn a semantic parser that maps natural language utterances into executable programs when only indirect supervision is available: examples are labeled with the correct execution result, but not the program itself. Consequently, we must search the space of programs for those that output the correct result, while not being misled by spurious programs: incorrect programs that coincidentally output the correct result. We connect two common learning paradigms, reinforcement learning (RL) and maximum marginal likelihood (MML), and then present a new learning algorithm that combines the strengths of both. The new algorithm guards against spurious programs by combining the systematic search traditionally employed in MML with the randomized exploration of RL, and by updating parameters such that probability is spread more evenly across consistent programs. We apply our learning algorithm to a new neural semantic parser and show significant gains over existing state-of-the-art results on a recent context-dependent semantic parsing task.

machine learning, reinforcement learning, spurious program, (19 more...)

arXiv.org Machine Learning

1704.07926

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

AI learns to play video game from instructions in plain English

New ScientistApr-24-2017, 13:20:16 GMT

An AI has learned to tackle one of the toughest Atari videogames by taking instructions in plain English. The system, developed by a team at Stanford University in California, learned to play the game Montezuma's Revenge, in which players scour an Aztec temple for treasure. The game is challenging for AI to learn because it offers sparse rewards, requiring players to make several moves before earning any points. Most videogame-playing AIs use reinforcement learning to develop a strategy, relying on feedback like game points to tell them when they are playing well. To help their AI pick up game tactics quicker, the Stanford team gave their reinforcement learning system a helping hand in the form of natural language instructions, for example advising it to "climb up the ladder" or "get the key".

artificial intelligence, machine learning, reinforcement learning, (9 more...)

New Scientist

Country: North America > United States > California (0.26)

Industry: Leisure & Entertainment > Games > Computer Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.33)

Add feedback

Graying the black box: Understanding DQNs

Zahavy, Tom, Zrihem, Nir Ben, Mannor, Shie

arXiv.org Artificial IntelligenceApr-24-2017

In recent years there is a growing interest in using deep representations for reinforcement learning. In this paper, we present a methodology and tools to analyze Deep Q-networks (DQNs) in a non-blind matter. Moreover, we propose a new model, the Semi Aggregated Markov Decision Process (SAMDP), and an algorithm that learns it automatically. The SAMDP model allows us to identify spatio-temporal abstractions directly from features and may be used as a sub-goal detector in future work. Using our tools we reveal that the features learned by DQNs aggregate the state space in a hierarchical fashion, explaining its success. Moreover, we are able to understand and describe the policies learned by DQNs for three different Atari2600 games and suggest ways to interpret, debug and optimize deep neural networks in reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1602.02658

Country: Asia > Middle East (0.28)

Genre:

Research Report (0.50)
Instructional Material (0.46)

Industry:

Leisure & Entertainment > Games (1.00)
Education (0.93)
Transportation (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Multi-Objective Decision Making

Roijers, Diederik M., Whiteson, Shimon

Morgan & Claypool PublishersApr-20-2017

Many real-world decision problems have multiple objectives. For example, when choosing a medical treatment plan, we want to maximize the efficacy of the treatment, but also minimize the side effects. These objectives typically conflict, e.g., we can often increase the efficacy of the treatment, but at the cost of more severe side effects. In this book, we outline how to deal with multiple objectives in decision-theoretic planning and reinforcement learning algorithms. To illustrate this, we employ the popular problem classes of multi-objective Markov decision processes (MOMDPs) and multi-objective coordination graphs (MO-CoGs).

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Morgan & Claypool Publishers

Country:

Europe (0.37)
North America > United States > Texas (0.16)

Industry: Health & Medicine (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.56)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.36)

Add feedback

O$^2$TD: (Near)-Optimal Off-Policy TD Learning

Liu, Bo, Lyu, Daoming, Dong, Wen, Biaz, Saad

arXiv.org Machine LearningApr-19-2017

Temporal difference learning and Residual Gradient methods are the most widely used temporal difference based learning algorithms; however, it has been shown that none of their objective functions is optimal w.r.t approximating the true value function V. Two novel algorithms are proposed to approximate the true value function V. This paper makes the following contributions: - A batch algorithm that can help find the approximate optimal off-policy prediction of the true value function V. - A linear computational cost (per step) near-optimal algorithm that can learn from a collection of off-policy samples.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1704.05147

Country: North America > United States (0.46)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep Q-Learning For Self-Driving Cars – Josh Patterson – Medium

#artificialintelligenceApr-18-2017, 02:16:24 GMT

Recently, I was fortunate enough to be awarded a Data61 summer research scholarship from the CSIRO. This post is the second of a 3 part series detailing what I learned, the conclusions I came to and some mistakes I made along the way. My chosen topic was Deep Q-Learning For Self-Driving Cars. This installment outlines my implementation of Deep Q-Learning to navigate a straight stretch of simulated highway. The end goal of the project is to train a model well enough to control an RC Car, then, if all goes well, something larger.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

#artificialintelligence

Industry:

Transportation > Passenger (0.61)
Transportation > Ground > Road (0.61)
Information Technology > Robotics & Automation (0.61)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Understanding Negations in Information Processing: Learning from Replicating Human Behavior

Pröllochs, Nicolas, Feuerriegel, Stefan, Neumann, Dirk

arXiv.org Machine LearningApr-18-2017

Information systems experience an ever-growing volume of unstructured data, particularly in the form of textual materials. This represents a rich source of information from which one can create value for people, organizations and businesses. For instance, recommender systems can benefit from automatically understanding preferences based on user reviews or social media. However, it is difficult for computer programs to correctly infer meaning from narrative content. One major challenge is negations that invert the interpretation of words and sentences. As a remedy, this paper proposes a novel learning strategy to detect negations: we apply reinforcement learning to find a policy that replicates the human perception of negations based on an exogenous response, such as a user rating for reviews. Our method yields several benefits, as it eliminates the former need for expensive and subjective manual labeling in an intermediate stage. Moreover, the inferred policy can be used to derive statistical inferences and implications regarding how humans process and act on negations.

machine learning, natural language, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1704.05356

Country:

Europe (0.93)
North America > United States (0.68)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.94)
Banking & Finance (0.93)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain

Rajendran, Janarthanan, Lakshminarayanan, Aravind S., Khapra, Mitesh M., Prasanna, P, Ravindran, Balaraman

arXiv.org Artificial IntelligenceApr-17-2017

Transferring knowledge from prior source tasks in solving a new target task can be useful in several learning applications. The application of transfer poses two serious challenges which have not been adequately addressed. First, the agent should be able to avoid negative transfer, which happens when the transfer hampers or slows down the learning instead of helping it. Second, the agent should be able to selectively transfer, which is the ability to select and transfer from different and multiple source tasks for different parts of the state space of the target task. We propose A2T (Attend, Adapt and Transfer), an attentive deep architecture which adapts and transfers from these source tasks. Our model is generic enough to effect transfer of either policies or value functions. Empirical evaluations on different learning algorithms show that A2T is an effective architecture for transfer by being able to avoid negative transfer while transferring selectively from multiple source tasks in the same domain.

machine learning, reinforcement learning, target task, (18 more...)

arXiv.org Artificial Intelligence

1510.02879

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Artificial Intelligence will Speak Its Own Language -- Soon

#artificialintelligenceApr-9-2017, 13:15:43 GMT

The article is about a system that invents a language which is tied to perception of the world. In sum, the post reveals possibilities that might be opened via researches related to an artificial language. At least the language will be similar to a signal language typical for animals. Further languages will be evolved into more complex technologies. The language is not necessary spoken sounds but rather it is more an inner process.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Add feedback

Dissecting Reinforcement Learning-Part.1

#artificialintelligenceApr-8-2017, 20:09:06 GMT

Premise[This post is an introduction to reinforcement learning and it is meant to be the starting point for a reader who already has some machine learning background and is confident with a little bit of math and Python. When I study a new algorithm I always want to understand the underlying mechanisms. In this sense it is always useful to implement the algorithm from scratch using a programming language. I followed this approach in this post which can be long to read but worthy. When I started to study reinforcement learning I did not find any good online resource which explained from the basis what reinforcement learning really is. Most of the (very good) blogs out there focus on the modern approaches (Deep Reinforcement Learning) and introduce the Bellman equation without a satisfying explanation. I turned my attention to books and I found the one of Russel and Norvig called Artificial Intelligence: A Modern Approach. This post is based on chapters 17 of the second edition, and it can be considered an extended review of the chapter. I will use the same mathematical notation of the authors, in this way you can use the book to cover some missing parts or vice versa.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

#artificialintelligence

Country: North America > United States > California > Los Angeles County > Santa Monica (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback