Reinforcement Learning
What Is Reinforcement Learning?
Put simply, reinforcement learning is a machine learning technique that involves training an artificial intelligence agent through the repetition of actions and associated rewards. A reinforcement learning agent experiments in an environment, taking actions and being rewarded when the correct actions are taken. Over time, the agent learns to take the actions that will maximize its reward. That's a quick definition of reinforcement learning, but taking a closer look at the concepts behind reinforcement learning will help you gain a better, more intuitive understanding of it. The term "reinforcement learning" is adapted from the concept of reinforcement in psychology.
Learning Resilient Behaviors for Navigation Under Uncertainty Environments
Fan, Tingxiang, Long, Pinxin, Liu, Wenxi, Pan, Jia, Yang, Ruigang, Manocha, Dinesh
-- Deep reinforcement learning has great potential to acquire complex, adaptive behaviors for autonomous agents automatically. However, the underlying neural network polices have not been widely deployed in real-world applications, especially in these safety-critical tasks (e.g., autonomous driving). One of the reasons is that the learned policy cannot perform flexible and resilient behaviors as traditional methods to adapt to diverse environments. In this paper, we consider the problem that a mobile robot learns adaptive and resilient behaviors for navigating in unseen uncertain environments while avoiding collisions. We present a novel approach for uncertainty-aware navigation by introducing an uncertainty-aware predictor to model the environmental uncertainty, and we propose a novel uncertainty-aware navigation network to learn resilient behaviors in the prior unknown environments. T o train the proposed uncertainty-aware network more stably and efficiently, we present the temperature decay training paradigm, which balances exploration and exploitation during the training process. Our experimental evaluation demonstrates that our approach can learn resilient behaviors in diverse environments and generate adaptive trajectories according to environmental uncertainties. Videos of the experiments are available at https://sites.google.com/view/resilient-nav/ . With the recent progress of machine learning techniques, deep reinforcement learning has been seen as a promising technique for autonomous systems to learn intelligent and complex behaviors in manipulation and motion planning tasks [1]-[3].
DeepMNavigate: Deep Reinforced Multi-Robot Navigation Unifying Local & Global Collision Avoidance
Tan, Qingyang, Fan, Tingxiang, Pan, Jia, Manocha, Dinesh
We present a novel algorithm (DeepMNavigate) for global multi-agent navigation in dense scenarios using deep reinforcement learning. Our approach uses local and global information for each robot based on motion information maps. We use a three-layer CNN that uses these maps as input and generate a suitable action to drive each robot to its goal position. Our approach is general, learns an optimal policy using a multi-scenario, multi-state training algorithm, and can directly handle raw sensor measurements for local observations. We demonstrate the performance on complex, dense benchmarks with narrow passages on environments with tens of agents. We highlight the algorithm's benefits over prior learning methods and geometric decentralized algorithms in complex scenarios.
Reverse Experience Replay
The goal of this environment is to drive up on the mountain. However, the car's engine is not strong enough to simply accelerate and scale the mountain. Every frame agent receives -1 reward. Therefore, the dependencies of Q-values are strong. Considering these conditions, the reverse order update is useful here. All results are the average of 3 learning and test iterations. Deep Q-Learning Network with Reverse Experience Replay shows competitive results against Double DQN with Experience Replay and vanilla DQN with Experience Replay (Figure 5). Double DQN achieves the smallest results because of the Target-Network update (some transitions were sampled before Target-Network update, and the old max Q-value was used).Figure 5: Performance of DQN RER, DDQN ER, DQN ER algorithms in the Mountain Car Problem (the mean of the test results of 3 different learning processes from 3 different seeds). Table 1 presents the details of the Mountain Car experiment (NN structure, training and testing hyperparameters).
How can AI Automate End-to-End Data Science?
Aggarwal, Charu, Bouneffouf, Djallel, Samulowitz, Horst, Buesser, Beat, Hoang, Thanh, Khurana, Udayan, Liu, Sijia, Pedapati, Tejaswini, Ram, Parikshit, Rawat, Ambrish, Wistuba, Martin, Gray, Alexander
Data science is labor-intensive and human experts are scarce but heavily involved in every aspect of it. This makes data science time consuming and restricted to experts with the resulting quality heavily dependent on their experience and skills. To make data science more accessible and scalable, we need its democratization. Automated Data Science (AutoDS) is aimed towards that goal and is emerging as an important research and business topic. We introduce and define the AutoDS challenge, followed by a proposal of a general AutoDS framework that covers existing approaches but also provides guidance for the development of new methods. We categorize and review the existing literature from multiple aspects of the problem setup and employed techniques. Then we provide several views on how AI could succeed in automating end-to-end AutoDS. We hope this survey can serve as insightful guideline for the AutoDS field and provide inspiration for future research.
State2vec: Off-Policy Successor Features Approximators
Madjiheurem, Sephora, Toni, Laura
A major challenge in reinforcement learning (RL) is the design of agents that are able to generalize across tasks that share common dynamics. A viable solution is meta-reinforcement learning, which identifies common structures among past tasks to be then generalized to new tasks (meta-test). In meta-training, the RL agent learns state representations that encode prior information from a set of tasks, used to generalize the value function approximation. This has been proposed in the literature as successor representation approximators. While promising, these methods do not generalize well across optimal policies, leading to sampling-inefficiency during meta-test phases. In this paper, we propose state2vec, an efficient and low-complexity framework for learning successor features which (i) generalize across policies, (ii) ensure sample-efficiency during meta-test. We extend the well known node2vec framework to learn state embeddings that account for the discounted future state transitions in RL. The proposed off-policy state2vec captures the geometry of the underlying state space, making good basis functions for linear value function approximation.
Learning Humanoid Robot Running Skills through Proximal Policy Optimization
Melo, Luckeciano C., Maximo, Marcos R. O. A.
In the current level of evolution of Soccer 3D, motion control is a key factor in team's performance. Recent works takes advantages of model-free approaches based on Machine Learning to exploit robot dynamics in order to obtain faster locomotion skills, achieving running policies and, therefore, opening a new research direction in the Soccer 3D environment. In this work, we present a methodology based on Deep Reinforcement Learning that learns running skills without any prior knowledge, using a neural network whose inputs are related to robot's dynamics. Our results outperformed the previous state-of-the-art sprint velocity reported in Soccer 3D literature by a significant margin. It also demonstrated improvement in sample efficiency, being able to learn how to run in just few hours. We reported our results analyzing the training procedure and also evaluating the policies in terms of speed, reliability and human similarity. Finally, we presented key factors that lead us to improve previous results and shared some ideas for future work.
Faster and Safer Training by Embedding High-Level Knowledge into Deep Reinforcement Learning
Zhang, Haodi, Gao, Zihang, Zhou, Yi, Zhang, Hao, Wu, Kaishun, Lin, Fangzhen
Deep reinforcement learning has been successfully used in many dynamic decision making domains, especially those with very large state spaces. However, it is also well-known that deep reinforcement learning can be very slow and resource intensive. The resulting system is often brittle and difficult to explain. In this paper, we attempt to address some of these problems by proposing a framework of Rule-interposing Learning (RIL) that embeds high level rules into the deep reinforcement learning. With some good rules, this framework not only can accelerate the learning process, but also keep it away from catastrophic explorations, thus making the system relatively stable even during the very early stage of training. Moreover, given the rules are high level and easy to interpret, they can be easily maintained, updated and shared with other similar tasks.
Towards More Sample Efficiency in Reinforcement Learning with Data Augmentation
Lin, Yijiong, Huang, Jiancong, Zimmer, Matthieu, Rojas, Juan, Weng, Paul
In this framework, the robot learning problem corresponds to an RL problem that aims at obtaining a policy ฯ: S G A such that the expected discounted sum of rewards is maximized for any given goal. When the reward function is sparse, as assumed here, this RL problem is particularly hard to solve. In particular, we consider here reward functions that are described as follows: R ( s,a,s null,g) 1[ d( s null,g) null R] 1 where 1 is the indicator function, d is a distance, and null R 0 is a fixed threshold. To tackle this issue, Andrychowicz et al. [2017] proposed HER, which is based on the following principle: Any trajectory that failed to reach its goal still carries useful information; it has at least reached the states of its trajectory path. Using this natural and powerful idea, memory replay can be augmented with the failed trajectories by changing their goals in hindsight .
Fundamentals of Reinforcement Learning : The K-bandit Problem, Illustrated
Welcome to GradientCrescent's special series on reinforcement learning. This series will serve to introduce some of the fundamental concepts in reinforcement learning using digestible examples, primarily obtained from the" Reinforcement Learning" text by Sutton et. Note that code in this series will be kept to a minimum- readers interested in implementations are directed to the official course, or our Github. The secondary purpose of this series is to reinforce (pun intended) my own learning in the field. Reinforcement learning has quickly captured the imagination of the general public, with organisations such as Deepming achieving success in games such as Go, Starcraft, and Quake III, along with more practical achievements such as disease detection and self-mapping.