Reinforcement Learning
Differential Variable Speed Limits Control for Freeway Recurrent Bottlenecks via Deep Reinforcement learning
Wu, Yuankai, Tan, Huachun, Ran, Bin
Variable speed limits (VSL) control is a flexible way to improve traffic condition,increase safety and reduce emission. There is an emerging trend of using reinforcement learning technique for VSL control and recent studies have shown promising results. Currently, deep learning is enabling reinforcement learning to develope autonomous control agents for problems that were previously intractable. In this paper, we propose a more effective deep reinforcement learning (DRL) model for differential variable speed limits (DVSL) control, in which the dynamic and different speed limits among lanes can be imposed. The proposed DRL models use a novel actor-critic architecture which can learn a large number of discrete speed limits in a continues action space. Different reward signals, e.g. total travel time, bottleneck speed, emergency braking, and vehicular emission are used to train the DVSL controller, and comparison between these reward signals are conducted. We test proposed DRL baased DVSL controllers on a simulated freeway recurrent bottleneck. Results show that the efficiency, safety and emissions can be improved by the proposed method. We also show some interesting findings through the visulization of the control policies generated from DRL models.
Neural Modular Control for Embodied Question Answering
Das, Abhishek, Gkioxari, Georgia, Lee, Stefan, Parikh, Devi, Batra, Dhruv
We present a modular approach for learning policies for navigation over long planning horizons from language input. Our hierarchical policy operates at multiple timescales, where the higher-level master policy proposes subgoals to be executed by specialized sub-policies. Our choice of subgoals is compositional and semantic, i.e. they can be sequentially combined in arbitrary orderings, and assume human-interpretable descriptions (e.g. 'exit room', 'find kitchen', 'find refrigerator', etc.). We use imitation learning to warm-start policies at each level of the hierarchy, dramatically increasing sample efficiency, followed by reinforcement learning. Independent reinforcement learning at each level of hierarchy enables sub-policies to adapt to consequences of their actions and recover from errors. Subsequent joint hierarchical training enables the master policy to adapt to the sub-policies.
Learning Representations in Model-Free Hierarchical Reinforcement Learning
Rafati, Jacob, Noelle, David C.
Common approaches to Reinforcement Learning (RL) are seriously challenged by large-scale applications involving huge state spaces and sparse delayed reward feedback. Hierarchical Reinforcement Learning (HRL) methods attempt to address this scalability issue by learning action selection policies at multiple levels of temporal abstraction. Abstraction can be had by identifying a relatively small set of states that are likely to be useful as subgoals, in concert with the learning of corresponding skill policies to achieve those subgoals. Many approaches to subgoal discovery in HRL depend on the analysis of a model of the environment, but the need to learn such a model introduces its own problems of scale. Once subgoals are identified, skills may be learned through intrinsic motivation, introducing an internal reward signal marking subgoal attainment. In this paper, we present a novel model-free method for subgoal discovery using incremental unsupervised learning over a small memory of the most recent experiences of the agent. When combined with an intrinsic motivation learning mechanism, this method learns subgoals and skills together, based on experiences in the environment. Thus, we offer an original approach to HRL that does not require the acquisition of a model of the environment, suitable for large-scale applications. We demonstrate the efficiency of our method on two RL problems with sparse delayed feedback: a variant of the rooms environment and the ATARI 2600 game called Montezuma's Revenge.
Loss Functions for Multiset Prediction
Welleck, Sean, Yao, Zixin, Gai, Yu, Mao, Jialin, Zhang, Zheng, Cho, Kyunghyun
We study the problem of multiset prediction. The goal of multiset prediction is to train a predictor that maps an input to a multiset consisting of multiple items. Unlike existing problems in supervised learning, such as classification, ranking and sequence generation, there is no known order among items in a target multiset, and each item in the multiset may appear more than once, making this problem extremely challenging. In this paper, we propose a novel multiset loss function by viewing this problem from the perspective of sequential decision making. The proposed multiset loss function is empirically evaluated on two families of datasets, one synthetic and the other real, with varying levels of difficulty, against various baseline loss functions including reinforcement learning, sequence, and aggregated distribution matching loss functions. The experiments reveal the effectiveness of the proposed loss function over the others.
Machine learning explained: Understanding supervised, unsupervised, and reinforcement learning
Once we start delving into the concepts behind Artificial Intelligence (AI) and Machine Learning (ML), we come across copious amounts of jargon related to this field of study. Understanding this jargon and how it can have an impact on the study related to ML goes a long way in comprehending the study that has been conducted by researchers and data scientists to get AI to the state it now is. In this article, I will be providing you with a comprehensive definition of supervised, unsupervised and reinforcement learning in the broader field of Machine Learning. You must have encountered these terms while hovering over articles pertaining to the progress made in AI and the role played by ML in propelling this success forward. Understanding these concepts is a given fact, and should not be compromised at any cost.
Meta-modeling game for deriving theoretical-consistent, micro-structural-based traction-separation laws via deep reinforcement learning
This paper presents a new meta-modeling framework to employ deep reinforcement learning (DRL) to generate mechanical constitutive models for interfaces. The constitutive models are conceptualized as information flow in directed graphs. The process of writing constitutive models are simplified as a sequence of forming graph edges with the goal of maximizing the model score (a function of accuracy, robustness and forward prediction quality). Thus meta-modeling can be formulated as a Markov decision process with well-defined states, actions, rules, objective functions, and rewards. By using neural networks to estimate policies and state values, the computer agent is able to efficiently self-improve the constitutive model it generated through self-playing, in the same way AlphaGo Zero (the algorithm that outplayed the world champion in the game of Go)improves its gameplay. Our numerical examples show that this automated meta-modeling framework not only produces models which outperform existing cohesive models on benchmark traction-separation data but is also capable of detecting hidden mechanisms among micro-structural features and incorporating them in constitutive models to improve the forward prediction accuracy, which are difficult tasks to do manually.
CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning
Colas, Cรฉdric, Fournier, Pierre, Sigaud, Olivier, Oudeyer, Pierre-Yves
In open-ended and changing environments, agents face a wide range of potential tasks that may or may not come with associated reward functions. Such autonomous learning agents must be able to generate their own tasks through a process of intrinsically motivated exploration, some of which might prove easy, others impossible. For this reason, they should be able to actively select which task to practice at any given moment, to maximize their overall mastery on the set of learnable tasks. This paper proposes CURIOUS, an extension of Universal Value Function Approximators that enables intrinsically motivated agents to learn to achieve both multiple tasks and multiple goals within a unique policy, leveraging hindsight learning. Agents focus on achievable tasks first, using an automated curriculum learning mechanism that biases their attention towards tasks maximizing the absolute learning progress. This mechanism provides robustness to catastrophic forgetting (by refocusing on tasks where performance decreases) and distracting tasks (by avoiding tasks with no absolute learning progress). Furthermore, we show that having two levels of parameterization (tasks and goals within tasks) enables more efficient learning of skills in an environment with a modular physical structure (e.g. multiple objects) as compared to flat, goal-parameterized RL with hindsight experience replay.
Inverse reinforcement learning for video games
Tucker, Aaron, Gleave, Adam, Russell, Stuart
Deep reinforcement learning achieves superhuman performance in a range of video game environments, but requires that a designer manually specify a reward function. It is often easier to provide demonstrations of a target behavior than to design a reward function describing that behavior. Inverse reinforcement learning (IRL) algorithms can infer a reward from demonstrations in low-dimensional continuous control environments, but there has been little work on applying IRL to high-dimensional video games. In our CNN-AIRL baseline, we modify the state-of-the-art adversarial IRL (AIRL) algorithm to use CNNs for the generator and discriminator. To stabilize training, we normalize the reward and increase the size of the discriminator training dataset. We additionally learn a low-dimensional state representation using a novel autoencoder architecture tuned for video game environments. This embedding is used as input to the reward network, improving the sample efficiency of expert demonstrations. Our method achieves high-level performance on the simple Catcher video game, substantially outperforming the CNN-AIRL baseline. We also score points on the Enduro Atari racing game, but do not match expert performance, highlighting the need for further work.
Learning Negotiating Behavior Between Cars in Intersections using Deep Q-Learning
Tram, Tommy, Jansson, Anton, Grรถnberg, Robin, Ali, Mohammad, Sjรถberg, Jonas
This paper concerns automated vehicles negotiating with other vehicles, typically human driven, in crossings with the goal to find a decision algorithm by learning typical behaviors of other vehicles. The vehicle observes distance and speed of vehicles on the intersecting road and use a policy that adapts its speed along its pre-defined trajectory to pass the crossing efficiently. Deep Q-learning is used on simulated traffic with different predefined driver behaviors and intentions. The results show a policy that is able to cross the intersection avoiding collision with other vehicles 98% of the time, while at the same time not being too passive. Moreover, inferring information over time is important to distinguish between different intentions and is shown by comparing the collision rate between a Deep Recurrent Q-Network at 0.85% and a Deep Q-learning at 1.75%.
Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks
Lee, Michelle A., Zhu, Yuke, Srinivasan, Krishnan, Shah, Parth, Savarese, Silvio, Fei-Fei, Li, Garg, Animesh, Bohg, Jeannette
Abstract-- Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. However, it is nontrivial to manually design a robot controller that combines modalities with very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to deploy on real robots due to sample complexity. We use self-supervision to learn a compact and multimodal representation of our sensory inputs, which can then be used to improve the sample efficiency of our policy learning. We evaluate our method on a peg insertion task, generalizing over different geometry, configurations, and clearances, while being robust to external perturbations. Results for simulated and real robot experiments are presented. Even in routine tasks such as putting a car key in the ignition, humans effortlessly combine our senses of vision and touch to complete the task. Visual feedback provides information about semantic and geometric object properties for accurate reaching or grasp pre-shaping. Haptic feedback provides information about the current contact conditions between object and environment for accurate localization and control even under occlusions.