Goto

Collaborating Authors

 Reinforcement Learning


Intelligent Roundabout Insertion using Deep Reinforcement Learning

arXiv.org Artificial Intelligence

The study and development of autonomous vehicles have seen an increasing interest in recent years, becoming hot topics in both academia and industry. One of the main reasearch areas in this field is related to control systems, in particular planning and decision-making problems. The basic approaches for scheduling high-level maneuver execution modules are based on the concepts of time-to-collision (van der Horst and Hogema, 1994) and headway control (Hatipoglu et al., 1996). In order to add interpretation capabilities to the system, several approaches model the driving decision-making problem as a Partially Observable Markov Decision Process (POMDP, (Spaan, 2012)), as in (Liu et al., 2015) for urban scenarios and in (Song et al., 2016) for intersection handling. A further extension is proposed in (Bandyopadhyay et al., 2012) where a Mixed Observability Markov Decision Process (MOMDP) (Ong et al., 2010) is used to model uncertainties in agents intentions. However, since vehicles are assumed to behave in a deterministic way, the aforementioned approaches handle many situations with excessive prudence and would not be able to enter in a busy roundabout.


Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics

arXiv.org Machine Learning

Many real-world control problems involve both discrete decision variables - such as the choice of control modes, gear switching or digital outputs - as well as continuous decision variables - such as velocity setpoints, control gains or analogue outputs. However, when defining the corresponding optimal control or reinforcement learning problem, it is commonly approximated with fully continuous or fully discrete action spaces. These simplifications aim at tailoring the problem to a particular algorithm or solver which may only support one type of action space. Alternatively, expert heuristics are used to remove discrete actions from an otherwise continuous space. In contrast, we propose to treat hybrid problems in their 'native' form by solving them with hybrid reinforcement learning, which optimizes for discrete and continuous actions simultaneously. In our experiments, we first demonstrate that the proposed approach efficiently solves such natively hybrid reinforcement learning problems. We then show, both in simulation and on robotic hardware, the benefits of removing possibly imperfect expert-designed heuristics. Lastly, hybrid reinforcement learning encourages us to rethink problem definitions. We propose reformulating control problems, e.g. by adding meta actions, to improve exploration or reduce mechanical wear and tear.


Joint Goal and Strategy Inference across Heterogeneous Demonstrators via Reward Network Distillation

arXiv.org Machine Learning

Reinforcement learning (RL) has achieved tremendous success as a general framework for learning how to make decisions. However, this success relies on the interactive hand-tuning of a reward function by RL experts. On the other hand, inverse reinforcement learning (IRL) seeks to learn a reward function from readily-obtained human demonstrations. Yet, IRL suffers from two major limitations: 1) reward ambiguity - there are an infinite number of possible reward functions that could explain an expert's demonstration and 2) heterogeneity - human experts adopt varying strategies and preferences, which makes learning from multiple demonstrators difficult due to the common assumption that demonstrators seeks to maximize the same reward. In this work, we propose a method to jointly infer a task goal and humans' strategic preferences via network distillation. This approach enables us to distill a robust task reward (addressing reward ambiguity) and to model each strategy's objective (handling heterogeneity). We demonstrate our algorithm can better recover task reward and strategy rewards and imitate the strategies in two simulated tasks and a real-world table tennis task.


Reasoning on Knowledge Graphs with Debate Dynamics

arXiv.org Machine Learning

We propose a novel method for automatic reasoning on knowledge graphs based on debate dynamics. The main idea is to frame the task of triple classification as a debate game between two reinforcement learning agents which extract arguments -- paths in the knowledge graph -- with the goal to promote the fact being true (thesis) or the fact being false (antithesis), respectively. Based on these arguments, a binary classifier, called the judge, decides whether the fact is true or false. The two agents can be considered as sparse, adversarial feature generators that present interpretable evidence for either the thesis or the antithesis. In contrast to other black-box methods, the arguments allow users to get an understanding of the decision of the judge. Since the focus of this work is to create an explainable method that maintains a competitive predictive accuracy, we benchmark our method on the triple classification and link prediction task. Thereby, we find that our method outperforms several baselines on the benchmark datasets FB15k-237, WN18RR, and Hetionet. We also conduct a survey and find that the extracted arguments are informative for users.


2019 in Review: 10 AI Papers That Made an Impact

#artificialintelligence

The volume of peer-reviewed AI research papers has grown by more than 300 percent over the past three decades (Stanford AI Index 2019), and the top AI conferences in 2019 saw a deluge of paper. CVPR submissions spiked to 5,165, a 56 percent increase over 2018; ICLR received 1,591 main conference paper submissions, up 60 percent over last year; ACL reported a record-breaking 2,906 submissions, almost doubling last year's 1,544; and ICCV 2019 received 4,303 submissions, more than twice the 2017 total. As part of our year-end series, Synced spotlights 10 artificial intelligence papers that garnered extraordinary attention and accolades in 2019. Abstract: Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success. Usually, the lookahead policies are implemented with specific planning methods such as Monte Carlo Tree Search (e.g. in AlphaZero).


Options of Interest: Temporal Abstraction with Interest Functions

arXiv.org Machine Learning

Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time. The options framework describes such behaviours as consisting of a subset of states in which they can initiate, an internal policy and a stochastic termination condition. However, much of the subsequent work on option discovery has ignored the initiation set, because of difficulty in learning it from data. We provide a generalization of initiation sets suitable for general function approximation, by defining an interest function associated with an option. We derive a gradient-based learning algorithm for interest functions, leading to a new interest-option-critic architecture. We investigate how interest functions can be leveraged to learn interpretable and reusable temporal abstractions. We demonstrate the efficacy of the proposed approach through quantitative and qualitative results, in both discrete and continuous environments.


Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies

arXiv.org Machine Learning

We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph which describes a set of subtasks and their dependencies that are unknown to the agent. The agent needs to quickly adapt to the task over few episodes during adaptation phase to maximize the return in the test phase. Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference(MSGI), which infers the latent parameter of the task by interacting with the environment and maximizes the return given the latent parameter. To facilitate learning, we adopt an intrinsic reward inspired by upper confidence bound (UCB) that encourages efficient exploration. Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter, and to adapt more efficiently than existing meta RL and hierarchical RL methods.


Humble Book Bundle: Python & Machine Learning by Packt

#artificialintelligence

Whether you're a Python developer new to machine learning or want to deepen your knowledge of the latest developments, our latest ebook bundles from Packt is perfect for you! Get titles like Python Machine Learning, Reinforcement Learning Algorithms with Python, and Machine Learning Projects for Mobile Applications. Plus, your purchase will support Innocent Lives Foundation! Normally, the total cost for the ebooks in this bundle is as much as $1,051. Here at Humble Bundle, you choose the price and increase your contribution to upgrade your bundle! This bundle has a minimum $1 purchase.


Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement Learning

arXiv.org Machine Learning

Reinforcement learning with sparse rewards is still an open challenge. Classic methods rely on getting feedback via extrinsic rewards to train the agent, and in situations where this occurs very rarely the agent learns slowly or cannot learn at all. Similarly, if the agent receives also rewards that create suboptimal modes of the objective function, it will likely prematurely stop exploring. More recent methods add auxiliary intrinsic rewards to encourage exploration. However, auxiliary rewards lead to a non-stationary target for the Q-function. In this paper, we present a novel approach that (1) plans exploration actions far into the future by using a long-term visitation count, and (2) decouples exploration and exploitation by learning a separate function assessing the exploration value of the actions. Contrary to existing methods which use models of reward and dynamics, our approach is off-policy and model-free. We further propose new tabular environments for benchmarking exploration in reinforcement learning. Empirical results on classic and novel benchmarks show that the proposed approach outperforms existing methods in environments with sparse rewards, especially in the presence of rewards that create suboptimal modes of the objective function. Results also suggest that our approach scales gracefully with the size of the environment. Source code is available at https://github.com/sparisi/visit-value-explore


Towards Neural-Guided Program Synthesis for Linear Temporal Logic Specifications

arXiv.org Artificial Intelligence

Synthesizing a program that realizes a logical specification is a classical problem in computer science. We examine a particular type of program synthesis, where the objective is to synthesize a strategy that reacts to a potentially adversarial environment while ensuring that all executions satisfy a Linear Temporal Logic (LTL) specification. Unfortunately, exact methods to solve so-called LTL synthesis via logical inference do not scale. In this work, we cast LTL synthesis as an optimization problem. We employ a neural network to learn a Q-function that is then used to guide search, and to construct programs that are subsequently verified for correctness. Our method is unique in combining search with deep learning to realize LTL synthesis. In our experiments the learned Q-function provides effective guidance for synthesis problems with relatively small specifications.