Reinforcement Learning
Towards Deployment of Deep-Reinforcement-Learning-Based Obstacle Avoidance into Conventional Autonomous Navigation Systems
Kästner, Linh, Buiyan, Teham, Zhao, Xinlin, Jiao, Lei, Shen, Zhengcheng, Lambrecht, Jens
Abstract--Recently, mobile robots have become important tools in various industries, especially in logistics. Deep reinforcement learning emerged as an alternative planning method to replace overly conservative approaches and promises more efficient and flexible navigation. However, deep reinforcement learning approaches are not suitable for long-range navigation due to their proneness to local minima and lack of long term memory, which hinders its widespread integration into industrial applications of mobile robotics. Therefore, a framework for training and testing the deep reinforcement learning algorithms along with classic approaches is presented. However, a main bottleneck is its limitation for local with multiple static and dynamic obstacles like humans, fork navigation, due to a lack a long term memory and its myopic lifts or robots. Efforts to integrate recurrent networks to mitigate dynamic environments is essential in the operation of mobile this issue result in tedious training and limited payoff.
A Bayesian Approach to Reinforcement Learning of Vision-Based Vehicular Control
Gharaee, Zahra, Holmquist, Karl, He, Linbo, Felsberg, Michael
In this paper, we present a state-of-the-art reinforcement learning method for autonomous driving. Our approach employs temporal difference learning in a Bayesian framework to learn vehicle control signals from sensor data. The agent has access to images from a forward facing camera, which are preprocessed to generate semantic segmentation maps. We trained our system using both ground truth and estimated semantic segmentation input. Based on our observations from a large set of experiments, we conclude that training the system on ground truth input data leads to better performance than training the system on estimated input even if estimated input is used for evaluation. The system is trained and evaluated in a realistic simulated urban environment using the CARLA simulator. The simulator also contains a benchmark that allows for comparing to other systems and methods. The required training time of the system is shown to be lower and the performance on the benchmark superior to competing approaches.
A Reinforcement Learning Environment For Job-Shop Scheduling
Tassel, Pierre, Gebser, Martin, Schekotihin, Konstantin
Scheduling is a fundamental task occurring in various automated systems applications, e.g., optimal schedules for machines on a job shop allow for a reduction of production costs and waste. Nevertheless, finding such schedules is often intractable and cannot be achieved by Combinatorial Optimization Problem (COP) methods within a given time limit. Recent advances of Deep Reinforcement Learning (DRL) in learning complex behavior enable new COP application possibilities. This paper presents an efficient DRL environment for Job-Shop Scheduling -- an important problem in the field. Furthermore, we design a meaningful and compact state representation as well as a novel, simple dense reward function, closely related to the sparse make-span minimization criteria used by COP methods. We demonstrate that our approach significantly outperforms existing DRL methods on classic benchmark instances, coming close to state-of-the-art COP approaches.
Support-Target Protocol for Meta-Learning
Lu, Su, Ye, Han-Jia, Zhan, De-Chuan
The support/query (S/Q) training protocol is widely used in meta-learning. S/Q protocol trains a task-specific model on S and then evaluates it on Q to optimize the meta-model using query loss, which depends on size and quality of Q. In this paper, we study a new S/T protocol for meta-learning. Assuming that we have access to the theoretically optimal model T for a task, we can directly match the task-specific model trained on S to T. S/T protocol offers a more accurate evaluation since it does not rely on possibly biased and noisy query instances. There are two challenges in putting S/T protocol into practice. Firstly, we have to determine how to match the task-specific model to T. To this end, we minimize the discrepancy between them on a fictitious dataset generated by adversarial learning, and distill the prediction ability of T to the task-specific model. Secondly, we usually do not have ready-made optimal models. As an alternative, we construct surrogate target models by fine-tuning on local tasks the globally pre-trained meta-model, maintaining both efficiency and veracity.
Connecting Deep-Reinforcement-Learning-based Obstacle Avoidance with Conventional Global Planners using Waypoint Generators
Kästner, Linh, Buiyan, Teham, Zhao, Xinlin, Shen, Zhengcheng, Marx, Cornelius, Lambrecht, Jens
Abstract--Deep Reinforcement Learning has emerged as an efficient dynamic obstacle avoidance method in highly dynamic environments. It has the potential to replace overly conservative or inefficient navigation approaches. Therefore, we integrate different waypoint generators into existing navigation systems and compare the joint system against traditional ones. We found an increased performance in terms of safety, efficiency and path smoothness especially in highly dynamic environments. Contrarily to existing works, the intermediate (RRT) search, and a local planner, which executes it considering planner should generate waypoints more dynamically and local observations and unknown obstacles.
PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics
Huang, Zhiao, Hu, Yuanming, Du, Tao, Zhou, Siyuan, Su, Hao, Tenenbaum, Joshua B., Gan, Chuang
Simulated virtual environments serve as one of the main driving forces behind developing and evaluating skill learning algorithms. However, existing environments typically only simulate rigid body physics. Additionally, the simulation process usually does not provide gradients that might be useful for planning and control optimizations. We introduce a new differentiable physics benchmark called PasticineLab, which includes a diverse collection of soft body manipulation tasks. In each task, the agent uses manipulators to deform the plasticine into a desired configuration. The underlying physics engine supports differentiable elastic and plastic deformation using the DiffTaichi system, posing many underexplored challenges to robotic agents. We evaluate several existing reinforcement learning (RL) methods and gradient-based methods on this benchmark. Experimental results suggest that 1) RL-based approaches struggle to solve most of the tasks efficiently; 2) gradient-based approaches, by optimizing open-loop control sequences with the built-in differentiable physics engine, can rapidly find a solution within tens of iterations, but still fall short on multi-stage tasks that require long-term planning. We expect that PlasticineLab will encourage the development of novel algorithms that combine differentiable physics and RL for more complex physics-based skill learning tasks. Virtual environments, such as Arcade Learning Environment (ALE) (Bellemare et al., 2013), Mu-JoCo (Todorov et al., 2012), and OpenAI Gym (Brockman et al., 2016) have significantly benefited the development and evaluation of learning algorithms on intelligent agent control and planning. However, existing virtual environments for skill learning typically involves rigid-body dynamics only.
Improving Robustness of Deep Reinforcement Learning Agents: Environment Attacks based on Critic Networks
Schott, Lucas, Césaire, Manon, Hajri, Hatem, Lamprier, Sylvain
To improve policy robustness of deep reinforcement learning agents, a line of recent works focus on producing disturbances of the environment. Existing approaches of the literature to generate meaningful disturbances of the environment are adversarial reinforcement learning methods. These methods set the problem as a two-player game between the protagonist agent, which learns to perform a task in an environment, and the adversary agent, which learns to disturb the protagonist via modifications of the considered environment. Both protagonist and adversary are trained with deep reinforcement learning algorithms. Alternatively, we propose in this paper to build on gradient-based adversarial attacks, usually used for classification tasks for instance, that we apply on the critic network of the protagonist to identify efficient disturbances of the environment. Rather than learning an attacker policy, which usually reveals as very complex and unstable, we leverage the knowledge of the critic network of the protagonist, to dynamically complexify the task at each step of the learning process. We show that our method, while being faster and lighter, leads to significantly better improvements in policy robustness than existing methods of the literature.
The Emergence of Abstract and Episodic Neurons in Episodic Meta-RL
AlKhamissi, Badr, ElNokrashy, Muhammad, Spranger, Michael
In this work, we analyze the reinstatement mechanism introduced by Ritter et al. (2018) to reveal two classes of neurons that emerge in the agent's working memory (an epLSTM cell) when trained using episodic meta-RL on an episodic variant of the Harlow visual fixation task. Specifically, Abstract neurons encode knowledge shared across tasks, while Episodic neurons carry information relevant for a specific episode's task.
Deep Interpretable Models of Theory of Mind For Human-Agent Teaming
Oguntola, Ini, Hughes, Dana, Sycara, Katia
When developing AI systems that interact with humans, it is essential to design both a system that can understand humans, and a system that humans can understand. Most deep network based agent-modeling approaches are 1) not interpretable and 2) only model external behavior, ignoring internal mental states, which potentially limits their capability for assistance, interventions, discovering false beliefs, etc. To this end, we develop an interpretable modular neural framework for modeling the intentions of other observed entities. We demonstrate the efficacy of our approach with experiments on data from human participants on a search and rescue task in Minecraft, and show that incorporating interpretability can significantly increase predictive performance under the right conditions.
Data-Driven Simulation of Ride-Hailing Services using Imitation and Reinforcement Learning
Jayasinghe, Haritha, Jayatilaka, Tarindu, Gunawardena, Ravin, Thayasivam, Uthayasanker
The rapid growth of ride-hailing platforms has created a highly competitive market where businesses struggle to make profits, demanding the need for better operational strategies. However, real-world experiments are risky and expensive for these platforms as they deal with millions of users daily. Thus, a need arises for a simulated environment where they can predict users' reactions to changes in the platform-specific parameters such as trip fares and incentives. Building such a simulation is challenging, as these platforms exist within dynamic environments where thousands of users regularly interact with one another. This paper presents a framework to mimic and predict user, specifically driver, behaviors in ride-hailing services. We use a data-driven hybrid reinforcement learning and imitation learning approach for this. First, the agent utilizes behavioral cloning to mimic driver behavior using a real-world data set. Next, reinforcement learning is applied on top of the pre-trained agents in a simulated environment, to allow them to adapt to changes in the platform. Our framework provides an ideal playground for ride-hailing platforms to experiment with platform-specific parameters to predict drivers' behavioral patterns.