Goto

Collaborating Authors

 Reinforcement Learning


Lunar Landings from Demonstrations – Towards Data Science

#artificialintelligence

Deep reinforcement learning algorithms have achieved remarkable results on a number of problems once thought to be unsolvable without the aid of human intuition and creativity. RL agents can learn to master tasks like chess and retro video games without any prior instruction -- often surpassing the performance of even the greatest human experts. But these methods are sample inefficient and rely on learning from hundreds or even thousands of complete failures before any progress is made. That's a luxury we can afford when the task is simple or can be simulated, like an Atari screen or chess board, but is at least partially responsible for RL's relatively short list of real-world applications. For example, it would be incredibly dangerous, expensive, and time inefficient to let a self-driving algorithm learn by smashing a real car into a real wall for the 1000 iterations it might take for it to figure out what the brakes do, or to learn to land a rocket by crashing the first 500 of them.


Weekly Machine Learning Opensource Roundup – Aug. 30, 2018

#artificialintelligence

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. Simple Baselines for Human Pose Estimation and Tracking The project is an official implement of Microsoft ECCV2018 paper "Simple Baselines for Human Pose Estimation and Tracking"


Directed Exploration in PAC Model-Free Reinforcement Learning

arXiv.org Machine Learning

We study an exploration method for model-free RL that generalizes the counter-based exploration bonus methods and takes into account long term exploratory value of actions rather than a single step look-ahead. We propose a model-free RL method that modifies Delayed Q-learning and utilizes the long-term exploration bonus with provable efficiency. We show that our proposed method finds a near-optimal policy in polynomial time (PAC-MDP), and also provide experimental evidence that our proposed algorithm is an efficient exploration method.


Learning End-to-end Autonomous Driving using Guided Auxiliary Supervision

arXiv.org Artificial Intelligence

Learning to drive faithfully in highly stochastic urban settings remains an open problem. To that end, we propose a Multi-task Learning from Demonstration (MT-LfD) framework which uses supervised auxiliary task prediction to guide the main task of predicting the driving commands. Our framework involves an end-to-end trainable network for imitating the expert demonstrator's driving commands. The network intermediately predicts visual affordances and action primitives through direct supervision which provide the aforementioned auxiliary supervised guidance. We demonstrate that such joint learning and supervised guidance facilitates hierarchical task decomposition, assisting the agent to learn faster, achieve better driving performance and increases transparency of the otherwise black-box end-to-end network. We run our experiments to validate the MT-LfD framework in CARLA, an open-source urban driving simulator. We introduce multiple non-player agents in CARLA and induce temporal noise in them for realistic stochasticity.


Multi-Hop Knowledge Graph Reasoning with Reward Shaping

arXiv.org Artificial Intelligence

Multi-hop reasoning is an effective approach for query answering (QA) over incomplete knowledge graphs (KGs). The problem can be formulated in a reinforcement learning (RL) setup, where a policy-based agent sequentially extends its inference path until it reaches a target. However, in an incomplete KG environment, the agent receives low-quality rewards corrupted by false negatives in the training data, which harms generalization at test time. Furthermore, since no golden action sequence is used for training, the agent can be misled by spurious search trajectories that incidentally lead to the correct answer. We propose two modeling advances to address both issues: (1) we reduce the impact of false negative supervision by adopting a pretrained one-hop embedding model to estimate the reward of unobserved facts; (2) we counter the sensitivity to spurious paths of on-policy RL by forcing the agent to explore a diverse set of paths using randomly generated edge masks. Our approach significantly improves over existing path-based KGQA models on several benchmark datasets and is comparable or better than embedding-based models.


Application of Self-Play Reinforcement Learning to a Four-Player Game of Imperfect Information

arXiv.org Artificial Intelligence

We introduce a new virtual environment for simulating a card game known as "Big 2". This is a four-player game of imperfect information with a relatively complicated action space (being allowed to play 1,2,3,4 or 5 card combinations from an initial starting hand of 13 cards). As such it poses a challenge for many current reinforcement learning methods. We then use the recently proposed "Proximal Policy Optimization" algorithm to train a deep neural network to play the game, purely learning via self-play, and find that it is able to reach a level which outperforms amateur human players after only a relatively short amount of training time and without needing to search a tree of future game states.


ExpIt-OOS: Towards Learning from Planning in Imperfect Information Games

arXiv.org Artificial Intelligence

The current state of the art in playing many important perfect information games, including Chess and Go, combines planning and deep reinforcement learning with self-play. We extend this approach to imperfect information games and present ExIt-OOS, a novel approach to playing imperfect information games within the Expert Iteration framework and inspired by AlphaZero. We use Online Outcome Sampling, an online search algorithm for imperfect information games in place of MCTS. While training online, our neural strategy is used to improve the accuracy of playouts in OOS, allowing a learning and planning feedback loop for imperfect information games.


Tutorial: Double Deep Q-Learning with Dueling Network Architecture

#artificialintelligence

If you are as fascinated by Deep Q-Learning as I am but never had the time to understand or implement it, this is for you: In one Jupyter notebook I will 1) briefly explain how Reinforcement Learning differs from Supervised Learning, 2) discuss the theory behind Deep Q-Networks (DQN) by telling you where you find the respective explanations in the papers and what they mean and 3) how to implement the components needed to make it work in python and tensorflow. In 2013 a London based startup called DeepMind published a groundbreaking paper called Playing Atari with Deep Reinforcement Learning on arXiv: The authors presented a variant of Reinforcement Learning called Deep Q-Learning that is able to successfully learn control policies for different Atari 2600 games receiving only screen pixels as input and a reward when the game score changes. This is an astonishing result because previously "AIs" used to be limited to one single game, for instance, chess, whereas in this case the types and contents of the games in the Arcade Learning Environment vary significantly and yet no adjustment of the architecture, learning algorithm or hyperparameters is needed. No wonder DeepMind was bought by Google for 500 Million Dollars. The company has since been one of the leading institutions advancing Deep Learning research and a later article discussing DQN has been published in Nature.


Learning a Policy for Opportunistic Active Learning

arXiv.org Artificial Intelligence

Active learning identifies data points to label that are expected to be the most useful in improving a supervised model. Opportunistic active learning incorporates active learning into interactive tasks that constrain possible queries during interactions. Prior work has shown that opportunistic active learning can be used to improve grounding of natural language descriptions in an interactive object retrieval task. In this work, we use reinforcement learning for such an object retrieval task, to learn a policy that effectively trades off task completion with model improvement that would benefit future tasks.


A Reinforcement Learning-driven Translation Model for Search-Oriented Conversational Systems

arXiv.org Machine Learning

Search-oriented conversational systems rely on information needs expressed in natural language (NL). We focus here on the understanding of NL expressions for building keyword-based queries. We propose a reinforcement-learning-driven translation model framework able to 1) learn the translation from NL expressions to queries in a supervised way, and, 2) to overcome the lack of large-scale dataset by framing the translation model as a word selection approach and injecting relevance feedback in the learning process. Experiments are carried out on two TREC datasets and outline the effectiveness of our approach.