Reinforcement Learning
Reinforcement Learning Decoders for Fault-Tolerant Quantum Computation
Sweke, Ryan, Kesselring, Markus S., van Nieuwenburg, Evert P. L., Eisert, Jens
In order to implement large scale quantum computations it is necessary to be able to store and manipulate quantum information in a manner that is robust to the unavoidable errors introduced through interaction of the physical qubits with a noisy environment. The known strategy for achieving such robustness is to encode a single logical qubit into the state of many physical qubits, via a quantum error correcting code, from which it is possible to actively diagnose and correct errors that may occur [1, 2]. While many quantum error correcting codes exist, topological quantum codes [1-8], in which only local operations are required to diagnose and correct errors, are of particular interest as a result of their experimental feasibility [9-15]. In particular, the surface code has emerged as an especially promising candidate for large-scale fault-tolerant quantum computation, due to the combination of its comparatively low overhead and locality requirements, coupled with the availability of convenient strategies for the implementation of all required logical gates [16, 17]. In fact, current road maps towards the realization of robust quantum computing have identified surface code based approaches as the most feasible methodology for achieving this goal [18].
Composable Action-Conditioned Predictors: Flexible Off-Policy Learning for Robot Navigation
Kahn, Gregory, Villaflor, Adam, Abbeel, Pieter, Levine, Sergey
A general-purpose intelligent robot must be able to learn autonomously and be able to accomplish multiple tasks in order to be deployed in the real world. However, standard reinforcement learning approaches learn separate task-specific policies and assume the reward function for each task is known a priori. We propose a framework that learns event cues from off-policy data, and can flexibly combine these event cues at test time to accomplish different tasks. These event cue labels are not assumed to be known a priori, but are instead labeled using learned models, such as computer vision detectors, and then `backed up' in time using an action-conditioned predictive model. We show that a simulated robotic car and a real-world RC car can gather data and train fully autonomously without any human-provided labels beyond those needed to train the detectors, and then at test-time be able to accomplish a variety of different tasks. Videos of the experiments and code can be found at https://github.com/gkahn13/CAPs
Binary Matrix Guessing Problem
We introduce the Binary Matrix Guessing Problem and provide two algorithms to solve this problem. The first algorithm we introduce is Elementwise Probing Algorithm (EPA) which is very fast under a score which utilizes Frobenius Distance. The second algorithm is Additive Reinforcement Learning Algorithm which combines ideas from perceptron algorithm and reinforcement learning algorithm. This algorithm is significantly slower compared to first one, but less restrictive and generalizes better. We compare computational performance of both algorithms and provide numerical results. reason for withdrawal: Paper will be rewritten with experiments replicated on verified and validated hardware and software.
Deep Reinforcement Learning
We discuss deep reinforcement learning in an overview style. We draw a big picture, filled with details. We discuss six core elements, six important mechanisms, and twelve applications, focusing on contemporary work, and in historical contexts. We start with background of artificial intelligence, machine learning, deep learning, and reinforcement learning (RL), with resources. Next we discuss RL core elements, including value function, policy, reward, model, exploration vs. exploitation, and representation. Then we discuss important mechanisms for RL, including attention and memory, unsupervised learning, hierarchical RL, multi-agent RL, relational RL, and learning to learn. After that, we discuss RL applications, including games, robotics, natural language processing (NLP), computer vision, finance, business management, healthcare, education, energy, transportation, computer systems, and, science, engineering, and art. Finally we summarize briefly, discuss challenges and opportunities, and close with an epilogue.
Successor Uncertainties: exploration and uncertainty in temporal difference learning
Janz, David, Hron, Jiri, Hernรกndez-Lobato, Josรฉ Miguel, Hofmann, Katja, Tschiatschek, Sebastian
We consider the problem of balancing exploration and exploitation in sequential decision making problems. To explore efficiently, it is vital to consider the uncertainty over all consequences of a decision, and not just those that follow immediately; the uncertainties involved need to be propagated according to the dynamics of the problem. To this end, we develop Successor Uncertainties, a probabilistic model for the state-action value function of a Markov Decision Process that propagates uncertainties in a coherent and scalable way. We relate our approach to other classical and contemporary methods for exploration and present an empirical analysis.
Using Deep Reinforcement Learning for the Continuous Control of Robotic Arms
Deep reinforcement learning enables algorithms to learn complex behavior, deal with continuous action spaces and find good strategies in environments with high dimensional state spaces. With deep reinforcement learning being an active area of research and many concurrent inventions, we decided to focus on a relatively simple robotic task to evaluate a set of ideas that might help to solve recent reinforcement learning problems. We test a newly created combination of two commonly used reinforcement learning methods, whether it is able to learn more effectively than a baseline. We also compare different ideas to preprocess information before it is fed to the reinforcement learning algorithm. The goal of this strategy is to reduce training time and eventually help the algorithm to converge. The concluding evaluation proves the general applicability of the described concepts by testing them using a simulated environment. These concepts might be reused for future experiments.
Deep Imitative Models for Flexible Inference, Planning, and Control
Rhinehart, Nicholas, McAllister, Rowan, Levine, Sergey
Imitation learning provides an appealing framework for autonomous control: in many tasks, demonstrations of preferred behavior can be readily obtained from human experts, removing the need for costly and potentially dangerous online data collection in the real world. However, policies learned with imitation learning have limited flexibility to accommodate varied goals at test time. Model-based reinforcement learning (MBRL) offers considerably more flexibility, since a predictive model learned from data can be used to achieve various goals at test time. However, MBRL suffers from two shortcomings. First, the predictive model does not help to choose desired or safe outcomes -- it reasons only about what is possible, not what is preferred. Second, MBRL typically requires additional online data collection to ensure that the model is accurate in those situations that are actually encountered when attempting to achieve test time goals. Collecting this data with a partially trained model can be dangerous and time-consuming. In this paper, we aim to combine the benefits of imitation learning and MBRL, and propose imitative models: probabilistic predictive models able to plan expert-like trajectories to achieve arbitrary goals. We find this method substantially outperforms both direct imitation and MBRL in a simulated autonomous driving task, and can be learned efficiently from a fixed set of expert demonstrations without additional online data collection. We also show our model can flexibly incorporate user-supplied costs as test-time, can plan to sequences of goals, and can even perform well with imprecise goals, including goals on the wrong side of the road.
Optimizing Agent Behavior over Long Time Scales by Transporting Value
Hung, Chia-Chun, Lillicrap, Timothy, Abramson, Josh, Wu, Yan, Mirza, Mehdi, Carnevale, Federico, Ahuja, Arun, Wayne, Greg
Humans spend a remarkable fraction of waking life engaged in acts of "mental time travel". We dwell on our actions in the past and experience satisfaction or regret. More than merely autobiographical storytelling, we use these event recollections to change how we will act in similar scenarios in the future. This process endows us with a computationally important ability to link actions and consequences across long spans of time, which figures prominently in addressing the problem of long-term temporal credit assignment; in artificial intelligence (AI) this is the question of how to evaluate the utility of the actions within a long-duration behavioral sequence leading to success or failure in a task. Existing approaches to shorter-term credit assignment in AI cannot solve tasks with long delays between actions and consequences. Here, we introduce a new paradigm for reinforcement learning where agents use recall of specific memories to credit actions from the past, allowing them to solve problems that are intractable for existing algorithms. This paradigm broadens the scope of problems that can be investigated in AI and offers a mechanistic account of behaviors that may inspire computational models in neuroscience, psychology, and behavioral economics.
Visual Semantic Navigation using Scene Priors
Yang, Wei, Wang, Xiaolong, Farhadi, Ali, Gupta, Abhinav, Mottaghi, Roozbeh
Figure 1: Our goal is to use scene priors to improve navigation in unseen scenes and towards novel objects. The most likely locations are shown with the orange box. How do humans navigate to target objects in novel scenes? Do we use the semantic/functional priors we have built over years to efficiently search and navigate? For example, to search for mugs, we search cabinets near the coffee machine and for fruits we try the fridge. In this work, we focus on incorporating semantic priors in the task of semantic navigation. We propose to use Graph Convolutional Networks for incorporating the prior knowledge into a deep reinforcement learning framework. The agent uses the features from the knowledge graph to predict the actions. For evaluation, we use the AI2-THOR framework. Our experiments show how semantic knowledge improves performance significantly. More importantly, we show improvement in generalization to unseen scenes and/or objects.
Assessing the Potential of Classical Q-learning in General Game Playing
Wang, Hui, Emmerich, Michael, Plaat, Aske
After the recent groundbreaking results of AlphaGo and AlphaZero, we have seen strong interests in deep reinforcement learning and artificial general intelligence (AGI) in game playing. However, deep learning is resource-intensive and the theory is not yet well developed. For small games, simple classical table-based Q-learning might still be the algorithm of choice. General Game Playing (GGP) provides a good testbed for reinforcement learning to research AGI. Q-learning is one of the canonical reinforcement learning methods, and has been used by (Banerjee $\&$ Stone, IJCAI 2007) in GGP. In this paper we implement Q-learning in GGP for three small-board games (Tic-Tac-Toe, Connect Four, Hex)\footnote{source code: https://github.com/wh1992v/ggp-rl}, to allow comparison to Banerjee et al.. We find that Q-learning converges to a high win rate in GGP. For the $\epsilon$-greedy strategy, we propose a first enhancement, the dynamic $\epsilon$ algorithm. In addition, inspired by (Gelly $\&$ Silver, ICML 2007) we combine online search (Monte Carlo Search) to enhance offline learning, and propose QM-learning for GGP. Both enhancements improve the performance of classical Q-learning. In this work, GGP allows us to show, if augmented by appropriate enhancements, that classical table-based Q-learning can perform well in small games.