Goto

Collaborating Authors

 Reinforcement Learning


From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood

arXiv.org Machine Learning

Our goal is to learn a semantic parser that maps natural language utterances into executable programs when only indirect supervision is available: examples are labeled with the correct execution result, but not the program itself. Consequently, we must search the space of programs for those that output the correct result, while not being misled by spurious programs: incorrect programs that coincidentally output the correct result. We connect two common learning paradigms, reinforcement learning (RL) and maximum marginal likelihood (MML), and then present a new learning algorithm that combines the strengths of both. The new algorithm guards against spurious programs by combining the systematic search traditionally employed in MML with the randomized exploration of RL, and by updating parameters such that probability is spread more evenly across consistent programs. We apply our learning algorithm to a new neural semantic parser and show significant gains over existing state-of-the-art results on a recent context-dependent semantic parsing task.


AI learns to play video game from instructions in plain English

New Scientist

An AI has learned to tackle one of the toughest Atari videogames by taking instructions in plain English. The system, developed by a team at Stanford University in California, learned to play the game Montezuma's Revenge, in which players scour an Aztec temple for treasure. The game is challenging for AI to learn because it offers sparse rewards, requiring players to make several moves before earning any points. Most videogame-playing AIs use reinforcement learning to develop a strategy, relying on feedback like game points to tell them when they are playing well. To help their AI pick up game tactics quicker, the Stanford team gave their reinforcement learning system a helping hand in the form of natural language instructions, for example advising it to "climb up the ladder" or "get the key".


Graying the black box: Understanding DQNs

arXiv.org Artificial Intelligence

In recent years there is a growing interest in using deep representations for reinforcement learning. In this paper, we present a methodology and tools to analyze Deep Q-networks (DQNs) in a non-blind matter. Moreover, we propose a new model, the Semi Aggregated Markov Decision Process (SAMDP), and an algorithm that learns it automatically. The SAMDP model allows us to identify spatio-temporal abstractions directly from features and may be used as a sub-goal detector in future work. Using our tools we reveal that the features learned by DQNs aggregate the state space in a hierarchical fashion, explaining its success. Moreover, we are able to understand and describe the policies learned by DQNs for three different Atari2600 games and suggest ways to interpret, debug and optimize deep neural networks in reinforcement learning.


Multi-Objective Decision Making

Morgan & Claypool Publishers

Many real-world decision problems have multiple objectives. For example, when choosing a medical treatment plan, we want to maximize the efficacy of the treatment, but also minimize the side effects. These objectives typically conflict, e.g., we can often increase the efficacy of the treatment, but at the cost of more severe side effects. In this book, we outline how to deal with multiple objectives in decision-theoretic planning and reinforcement learning algorithms. To illustrate this, we employ the popular problem classes of multi-objective Markov decision processes (MOMDPs) and multi-objective coordination graphs (MO-CoGs).


O$^2$TD: (Near)-Optimal Off-Policy TD Learning

arXiv.org Machine Learning

Temporal difference learning and Residual Gradient methods are the most widely used temporal difference based learning algorithms; however, it has been shown that none of their objective functions is optimal w.r.t approximating the true value function V. Two novel algorithms are proposed to approximate the true value function V. This paper makes the following contributions: - A batch algorithm that can help find the approximate optimal off-policy prediction of the true value function V. - A linear computational cost (per step) near-optimal algorithm that can learn from a collection of off-policy samples.


Deep Q-Learning For Self-Driving Cars โ€“ Josh Patterson โ€“ Medium

#artificialintelligence

Recently, I was fortunate enough to be awarded a Data61 summer research scholarship from the CSIRO. This post is the second of a 3 part series detailing what I learned, the conclusions I came to and some mistakes I made along the way. My chosen topic was Deep Q-Learning For Self-Driving Cars. This installment outlines my implementation of Deep Q-Learning to navigate a straight stretch of simulated highway. The end goal of the project is to train a model well enough to control an RC Car, then, if all goes well, something larger.


Understanding Negations in Information Processing: Learning from Replicating Human Behavior

arXiv.org Machine Learning

Information systems experience an ever-growing volume of unstructured data, particularly in the form of textual materials. This represents a rich source of information from which one can create value for people, organizations and businesses. For instance, recommender systems can benefit from automatically understanding preferences based on user reviews or social media. However, it is difficult for computer programs to correctly infer meaning from narrative content. One major challenge is negations that invert the interpretation of words and sentences. As a remedy, this paper proposes a novel learning strategy to detect negations: we apply reinforcement learning to find a policy that replicates the human perception of negations based on an exogenous response, such as a user rating for reviews. Our method yields several benefits, as it eliminates the former need for expensive and subjective manual labeling in an intermediate stage. Moreover, the inferred policy can be used to derive statistical inferences and implications regarding how humans process and act on negations.


Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain

arXiv.org Artificial Intelligence

Transferring knowledge from prior source tasks in solving a new target task can be useful in several learning applications. The application of transfer poses two serious challenges which have not been adequately addressed. First, the agent should be able to avoid negative transfer, which happens when the transfer hampers or slows down the learning instead of helping it. Second, the agent should be able to selectively transfer, which is the ability to select and transfer from different and multiple source tasks for different parts of the state space of the target task. We propose A2T (Attend, Adapt and Transfer), an attentive deep architecture which adapts and transfers from these source tasks. Our model is generic enough to effect transfer of either policies or value functions. Empirical evaluations on different learning algorithms show that A2T is an effective architecture for transfer by being able to avoid negative transfer while transferring selectively from multiple source tasks in the same domain.


Artificial Intelligence will Speak Its Own Language -- Soon

#artificialintelligence

The article is about a system that invents a language which is tied to perception of the world. In sum, the post reveals possibilities that might be opened via researches related to an artificial language. At least the language will be similar to a signal language typical for animals. Further languages will be evolved into more complex technologies. The language is not necessary spoken sounds but rather it is more an inner process.


Dissecting Reinforcement Learning-Part.1

#artificialintelligence

Premise[This post is an introduction to reinforcement learning and it is meant to be the starting point for a reader who already has some machine learning background and is confident with a little bit of math and Python. When I study a new algorithm I always want to understand the underlying mechanisms. In this sense it is always useful to implement the algorithm from scratch using a programming language. I followed this approach in this post which can be long to read but worthy. When I started to study reinforcement learning I did not find any good online resource which explained from the basis what reinforcement learning really is. Most of the (very good) blogs out there focus on the modern approaches (Deep Reinforcement Learning) and introduce the Bellman equation without a satisfying explanation. I turned my attention to books and I found the one of Russel and Norvig called Artificial Intelligence: A Modern Approach. This post is based on chapters 17 of the second edition, and it can be considered an extended review of the chapter. I will use the same mathematical notation of the authors, in this way you can use the book to cover some missing parts or vice versa.