Reinforcement Learning
TF-Agents: A Flexible Reinforcement Learning Library for TensorFlow
Reinforcement learning has become a trending topic among all the tech giants and none of them is sitting back to catch up on this. OpenAI Baselines is a set of high-quality implementations of reinforcement learning algorithms. These algorithms make it easier for the research community to replicate, refine, and identify new ideas to create good baselines to build research on top of. They have beautifully abstracted the details of the RL algorithms and have made the use of these techniques as easy as calling a single class and feeding it essential details like environment name and batch sizes. This has made experimentation much easier and implementation simpler for the people new to the field.
Transferable Force-Torque Dynamics Model for Peg-in-hole Task
Ding, Junfeng, Wang, Chen, Lu, Cewu
We present a learning-based force-torque dynamics to achieve model-based control for contact-rich peg-in-hole task using force-only inputs. Learning the force-torque dynamics is challenging because of the ambiguity of the low-dimensional 6-d force signal and the requirement of excessive training data. To tackle these problems, we propose a multi-pose force-torque state representation, based on which a dynamics model is learned with the data generated in a sample-efficient offline fashion. In addition, by training the dynamics model with peg-and-holes of various shapes, scales, and elasticities, the model could quickly transfer to new peg-and-holes after a small number of trials. Extensive experiments show that our dynamics model could adapt to unseen peg-and-holes with 70% fewer samples required compared to learning from scratch. Along with the learned dynamics, model predictive control and model-based reinforcement learning policies achieve over 80% insertion success rate. Our video is available at https://youtu.be/ZAqldpVZgm4.
Reinforcement learning applications provide focused models
A common measure of machine intelligence is challenging AI to play complex games against humans. The first AI programs tackled checkers and progressed to beat human players at chess, Go and a wide range of multiplayer games. The thinking behind reinforcement learning (RL) is that if a computer can outwit humans by thinking, planning ahead and predicting human behavior, then the machines have the capacity to learn anything. Now, researchers are still studying how computers learn through iteration and trial and error. One of the simplest goal-driven problems that computers were first tasked with was trying to find the right path through a maze.
Deep Reinforcement Learning with TensorFlow 2.0
In this tutorial, I will showcase the upcoming TensorFlow 2.0 features through the lens of deep reinforcement learning (DRL) by implementing an advantage actor-critic (A2C) agent to solve the classic CartPole-v0 environment. While the goal is to showcase TensorFlow 2.0, I will do my best to make the DRL aspect approachable as well, including a brief overview of the field. In fact, since the main focus of the 2.0 release is making developers' lives easier, it's a great time to get into DRL with TensorFlow -- our full agent source is under 150 lines! The code is available as a notebook here and online on Google Colab here. As TensorFlow 2.0 is still in an experimental stage, I recommend installing it in a separate (virtual) environment.
Stochastic learning control of inhomogeneous quantum ensembles
Stochastic learning control of inhomogeneous quantum ensembles Gabriel Turinici IUF - Institut Universitaire de France CEREMADE, Universit e Paris Dauphine - PSL Research University Oct 2019 Abstract In quantum control, the robustness with respect to uncertainties in the system's parameters or driving field characteristics is of paramount importance and has been studied theoretically, numerically and experimentally. We test in this paper stochastic search procedures (Stochastic gradient descent and the Adam algorithm) that sample, at each iteration, from the distribution of the parameter uncertainty, as opposed to previous approaches that use a fixed grid. We show that both algorithms behave well with respect to benchmarks and discuss their relative merits. In addition the methodology allows to address high dimensional parameter uncertainty; we implement numerically, with good results, a 3D and a 6D case. 1 Introduction Quantum control is a promising technology with many applications ranging from NMR [12] to quantum computing [15] and laser control of quantum dynamics [7]. The controlling field encounters many molecules which although identical in nature may interact differently with the incoming field because of e.g., different Larmor frequencies or rf attenuation factors (in NMR spin control or quantum computing, see [19, 29, 35, 22, 13, 17]), different spatial profile (see [24]) or other parameters (see [36, 8, 10]). For obvious practical reasons, it is of paramount importance to ensure that the control quality is 1 arXiv:1906.02991v3
Induction of Subgoal Automata for Reinforcement Learning
Furelos-Blanco, Daniel, Law, Mark, Russo, Alessandra, Broda, Krysia, Jonsson, Anders
Our method relies on inducing an automaton whose transitions are subgoals expressed as propositional formulas over a set of observable events. A state-of-the-art inductive logic programming system is used to learn the automaton from observation traces perceived by the RL agent. The reinforcement learning and automaton learning processes are interleaved: a new refined automaton is learned whenever the RL agent generates a trace not recognized by the current automaton. We evaluate ISA in several gridworld problems and show that it performs similarly to a method for which automata are given in advance. We also show that the learned automata can be exploited to speed up convergence through reward shaping and transfer learning across multiple tasks. Finally, we analyze the running time and the number of traces that ISA needs to learn an automata, and the impact that the number of observable events has on the learner's performance.
Quadratic Q-network for Learning Continuous Control for Autonomous Vehicles
Wang, Pin, Li, Hanhan, Chan, Ching-Yao
Reinforcement Learning algorithms have recently been proposed to learn time-sequential control policies in the field of autonomous driving. Direct applications of Reinforcement Learning algorithms with discrete action space will yield unsatisfactory results at the operational level of driving where continuous control actions are actually required. In addition, the design of neural networks often fails to incorporate the domain knowledge of the targeting problem such as the classical control theories in our case. In this paper, we propose a hybrid model by combining Q-learning and classic PID (Proportion Integration Differentiation) controller for handling continuous vehicle control problems under dynamic driving environment. Particularly, instead of using a big neural network as Q-function approximation, we design a Quadratic Q-function over actions with multiple simple neural networks for finding optimal values within a continuous space. We also build an action network based on the domain knowledge of the control mechanism of a PID controller to guide the agent to explore optimal actions more efficiently.We test our proposed approach in simulation under two common but challenging driving situations, the lane change scenario and ramp merge scenario. Results show that the autonomous vehicle agent can successfully learn a smooth and efficient driving behavior in both situations.
Distributed Soft Actor-Critic with Multivariate Reward Representation and Knowledge Distillation
In this paper, we describe NeurIPS 2019 Learning to Move - Walk Around challenge physics-based environment and present our solution to this competition which scored 1303.727 mean reward points and took 3rd place. Our method combines recent advances from both continuous- and discrete-action space reinforcement learning, such as Soft Actor-Critic and Recurrent Experience Replay in Distributed Reinforcement Learning. We trained our agent in two stages: to move somewhere at the first stage and to follow the target velocity field at the second stage. We also introduce novel Q-function split technique, which we believe facilitates the task of training an agent, allows critic pretraining and reusing it for solving harder problems, and mitigate reward shaping design efforts.