Reinforcement Learning
Improving Reinforcement Learning Based Image Captioning with Natural Language Prior
Guo, Tszhang, Chang, Shiyu, Yu, Mo, Bai, Kun
Recently, Reinforcement Learning (RL) approaches have demonstrated advanced performance in image captioning by directly optimizing the metric used for testing. However, this shaped reward introduces learning biases, which reduces the readability of generated text. In addition, the large sample space makes training unstable and slow. To alleviate these issues, we propose a simple coherent solution that constrains the action space using an n-gram language prior. Quantitative and qualitative evaluations on benchmarks show that RL with the simple add-on module performs favorably against its counterpart in terms of both readability and speed of convergence. Human evaluation results show that our model is more human readable and graceful. The implementation will become publicly available upon the acceptance of the paper.
Unity tweaks AI training tools, makes bid for academic respect
Unity Technologies on Monday released version 0.5 of its ML-Agents toolkit to make its Unity 3D game development platform better suited for developing and training autonomous agent code via machine learning. Initially rolled out a year ago in beta, version 0.5 comes with a few improvements. There's a wrapper for Gym (a toolkit for developing and testing reinforcement learning algorithms), support for letting agents make multiple action selections at once and for preventing agents from taking certain actions, and a refurbished set of environments called Marathon Environments. In these virtual spaces, AI researchers can teach software agents to perform certain tasks by rewarding them for correct actions. This sort of reinforcement learning can be limited to digital environments like video games or mapped to software-driven machines in the real world. Through its latest code update, Unity is making the case for Unity 3D as a key tool for AI research, a goal that company code boffins describe in a preprint paper titled, "Unity: A General Platform for Intelligent Agents."
Combined Reinforcement Learning via Abstract Representations
François-Lavet, Vincent, Bengio, Yoshua, Precup, Doina, Pineau, Joelle
In the quest for efficient and robust reinforcement learning methods, both model-free and model-based approaches offer advantages. In this paper we propose a new way of explicitly bridging both approaches via a shared low-dimensional learned encoding of the environment, meant to capture summarizing abstractions. We show that the modularity brought by this approach leads to good generalization while being computationally efficient, with planning happening in a smaller latent state space. In addition, this approach recovers a sufficient low-dimensional representation of the environment, which opens up new strategies for interpretable AI, exploration and transfer learning.
Sim-to-Real Transfer Learning using Robustified Controllers in Robotic Tasks involving Complex Dynamics
van Baar, Jeroen, Sullivan, Alan, Cordorel, Radu, Jha, Devesh, Romeres, Diego, Nikovski, Daniel
Learning robot tasks or controllers using deep reinforcement learning has been proven effective in simulations. Learning in simulation has several advantages. For example, one can fully control the simulated environment, including halting motions while performing computations. Another advantage when robots are involved, is that the amount of time a robot is occupied learning a task---rather than being productive---can be reduced by transferring the learned task to the real robot. Transfer learning requires some amount of fine-tuning on the real robot. For tasks which involve complex (non-linear) dynamics, the fine-tuning itself may take a substantial amount of time. In order to reduce the amount of fine-tuning we propose to learn robustified controllers in simulation. Robustified controllers are learned by exploiting the ability to change simulation parameters (both appearance and dynamics) for successive training episodes. An additional benefit for this approach is that it alleviates the precise determination of physics parameters for the simulator, which is a non-trivial task. We demonstrate our proposed approach on a real setup in which a robot aims to solve a maze puzzle, which involves complex dynamics due to static friction and potentially large accelerations. We show that the amount of fine-tuning in transfer learning for a robustified controller is substantially reduced compared to a non-robustified controller.
Reinforcement Learning in Topology-based Representation for Human Body Movement with Whole Arm Manipulation
Yuan, Weihao, Hang, Kaiyu, Song, Haoran, Kragic, Danica, Wang, Michael Y., Stork, Johannes A.
Moving a human body or a large and bulky object can require the strength of whole arm manipulation (WAM). This type of manipulation places the load on the robot's arms and relies on global properties of the interaction to succeed---rather than local contacts such as grasping or non-prehensile pushing. In this paper, we learn to generate motions that enable WAM for holding and transporting of humans in certain rescue or patient care scenarios. We model the task as a reinforcement learning problem in order to provide a behavior that can directly respond to external perturbation and human motion. For this, we represent global properties of the robot-human interaction with topology-based coordinates that are computed from arm and torso positions. These coordinates also allow transferring the learned policy to other body shapes and sizes. For training and evaluation, we simulate a dynamic sea rescue scenario and show in quantitative experiments that the policy can solve unseen scenarios with differently-shaped humans, floating humans, or with perception noise. Our qualitative experiments show the subsequent transporting after holding is achieved and we demonstrate that the policy can be directly transferred to a real world setting.
Multi-task Deep Reinforcement Learning with PopArt
Hessel, Matteo, Soyer, Hubert, Espeholt, Lasse, Czarnecki, Wojciech, Schmitt, Simon, van Hasselt, Hado
The reinforcement learning community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on. In this work, we study the problem of learning to master not one but multiple sequential-decision tasks at once. A general issue in multi-task learning is that a balance must be found between the needs of multiple tasks competing for the limited resources of a single learning system. Many learning algorithms can get distracted by certain tasks in the set of tasks to solve. Such tasks appear more salient to the learning process, for instance because of the density or magnitude of the in-task rewards. This causes the algorithm to focus on those salient tasks at the expense of generality. We propose to automatically adapt the contribution of each task to the agent's updates, so that all tasks have a similar impact on the learning dynamics. This resulted in state of the art performance on learning to play all games in a set of 57 diverse Atari games. Excitingly, our method learned a single trained policy - with a single set of weights - that exceeds median human performance. To our knowledge, this was the first time a single agent surpassed human-level performance on this multi-task domain. The same approach also demonstrated state of the art performance on a set of 30 tasks in the 3D reinforcement learning platform DeepMind Lab.
Re-purposing Compact Neuronal Circuit Policies to Govern Reinforcement Learning Tasks
Hasani, Ramin M., Lechner, Mathias, Amini, Alexander, Rus, Daniela, Grosu, Radu
We propose an effective method for creating interpretable control agents, by re-purposing the function of a biological neural circuit model, to govern simulated and real world reinforcement learning (RL) test-beds. Inspired by the structure of the nervous system of the soil-worm, C. elegans, we introduce Neuronal Circuit Policies (NCPs) as a novel recurrent neural network instance with liquid time-constants, universal approximation capabilities and interpretable dynamics. We theoretically show that they can approximate any finite simulation time of a given continuous n-dimensional dynamical system, with n output units and some hidden units. We model instances of the policies and learn their synaptic and neuronal parameters to control standard RL tasks and demonstrate its application for autonomous parking of a real rover robot on a predefined trajectory. For reconfiguration of the purpose of the neural circuit, we adopt a search-based RL algorithm. We show that our neuronal circuit policies perform as good as deep neural network policies with the advantage of realizing interpretable dynamics at the cell-level. We theoretically find bounds for the time-varying dynamics of the circuits, and introduce a novel way to reason about networks' dynamics.
Flatland: a Lightweight First-Person 2-D Environment for Reinforcement Learning
Caselles-Dupré, Hugo, Annabi, Louis, Hagen, Oksana, Garcia-Ortiz, Michael, Filliat, David
Flatland is a simple, lightweight environment for fast prototyping and testing of reinforcement learning agents. It is of lower complexity compared to similar 3D platforms (e.g. DeepMind Lab or VizDoom), but emulates physical properties of the real world, such as continuity, multi-modal partially-observable states with first-person view and coherent physics. We propose to use it as an intermediary benchmark for problems related to Lifelong Learning. Flatland is highly customizable and offers a wide range of task difficulty to extensively evaluate the properties of artificial agents. We experiment with three reinforcement learning baseline agents and show that they can rapidly solve a navigation task in Flatland. A video of an agent acting in Flatland is available here: https://youtu.be/I5y6Y2ZypdA.
Learning to Advertise with Adaptive Exposure via Constrained Two-Level Reinforcement Learning
Wang, Weixun, Jin, Junqi, Hao, Jianye, Chen, Chunjie, Yu, Chuan, Zhang, Weinan, Wang, Jun, Wang, Yixi, Li, Han, Xu, Jian, Gai, Kun
For online advertising in e-commerce, the traditional problem is to assign the right ad to the right user on fixed ad slots. In this paper, we investigate the problem of advertising with adaptive exposure, in which the number of ad slots and their locations can dynamically change over time based on their relative scores with recommendation products. In order to maintain user retention and long-term revenue, there are two types of constraints that need to be met in exposure: query-level and day-level constraints. We model this problem as constrained markov decision process with per-state constraint (psCMDP) and propose a constrained two-level reinforcement learning to decouple the original advertising exposure optimization problem into two relatively independent sub-optimization problems. We also propose a constrained hindsight experience replay mechanism to accelerate the policy training process. Experimental results show that our method can improve the advertising revenue while satisfying different levels of constraints under the real-world datasets. Besides, the proposal of constrained hindsight experience replay mechanism can significantly improve the training speed and the stability of policy performance.
Adaptive Behavior Generation for Autonomous Driving using Deep Reinforcement Learning with Compact Semantic States
Wolf, Peter, Kurzer, Karl, Wingert, Tobias, Kuhnt, Florian, Zöllner, J. Marius
Personal use of this material is permitted. Abstract-- Making the right decision in traffic is a challenging task that is highly dependent on individual preferences as well as the surrounding environment. Therefore it is hard to model solely based on expert knowledge. In this work we use Deep Reinforcement Learning to learn maneuver decisions based on a compact semantic state representation. This ensures a consistent model of the environment across scenarios as well as a behavior adaptation function, enabling online changes of desired behaviors without retraining. The input for the neural network is a simulated object list similar to that of Radar or Lidar sensors, superimposed by a relational semantic scene description. The state as well as the reward are extended by a behavior adaptation function and a parameterization respectively. With little expert knowledge and a set of mid-level actions, it can be seen that the agent is capable to adhere to traffic rules and learns to drive safely in a variety of situations. While sensors are improving at a staggering pace and actuators as well as control theory are well up to par to the challenging task of autonomous driving, it is yet to be seen how a robot can devise decisions that navigate it safely in a heterogeneous environment that is partially made up by humans who not always take rational decisions or have known cost functions.