Goto

Collaborating Authors

 Reinforcement Learning


Machine Learning to Help Optimize Traffic and Reduce Pollution

#artificialintelligence

Applying artificial intelligence to self-driving cars to smooth traffic, reduce fuel consumption, and improve air quality predictions may sound like the stuff of science fiction, but researchers at the Department of Energy's Lawrence Berkeley National Laboratory (Berkeley Lab) have launched two research projects to do just that. In collaboration with UC Berkeley, Berkeley Lab scientists are using deep reinforcement learning, a computational tool for training controllers, to make transportation more sustainable. One project uses deep reinforcement learning to train autonomous vehicles to drive in ways to simultaneously improve traffic flow and reduce energy consumption. A second uses deep learning algorithms to analyze satellite images combined with traffic information from cell phones and data already being collected by environmental sensors to improve air quality predictions. "Thirty percent of energy use in the U.S. is to transport people and goods, and this energy consumption contributes to air pollution, including approximately half of all nitrogen oxide emissions, a precursor to particular matter and ozone – and black carbon (soot) emissions," said Tom Kirchstetter, director of Berkeley Lab's Energy Analysis and Environmental Impacts Division, an adjunct professor at UC Berkeley, and a member of the research team.


Machine Learning Explained: Understanding Supervised, Unsupervised, and Reinforcement Learning Analytikus - Simplifying Data

#artificialintelligence

Supervised vs Reinforcement Learning: In Supervised Learning we have an external supervisor who has sufficient knowledge of the environment and also shares the learning with a supervisor to form a better understanding and complete the task, but since we have problems where the agent can perform so many different kind of subtasks by itself to achieve the overall objective, the presence of a supervisor is unnecessary and impractical. We can take up the example of a chess game, where the player can play tens of thousands of moves to achieve the ultimate objective. Creating a knowledge base for this purpose can be a really complicated task. Thus, it is imperative that in such tasks, the computer learn how to manage affairs by itself. It is hence more feasible and pertinent for the machine to learn from its own experience. Once the machine has started learning from its own experience, it can then gain knowledge from these experiences to implement in the future moves.


Exploration by Random Network Distillation

arXiv.org Artificial Intelligence

We introduce an exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed. The bonus is the error of a neural network predicting features of the observations given by a fixed randomly initialized neural network. We also introduce a method to flexibly combine intrinsic and extrinsic rewards. We find that the random network distillation (RND) bonus combined with this increased flexibility enables significant progress on several hard exploration Atari games. In particular we establish state of the art performance on Montezuma's Revenge, a game famously difficult for deep reinforcement learning methods. To the best of our knowledge, this is the first method that achieves better than average human performance on this game without using demonstrations or having access to the underlying state of the game, and occasionally completes the first level.


Reinforcement Learning and Deep Learning based Lateral Control for Autonomous Driving

arXiv.org Artificial Intelligence

Abstract--This paper investigates the vision-based autonomous driving with deep learning and reinforcement learning methods. Different from the end-to-end learning method, our method breaks the vision-based lateral control system down into a perception module and a control module. The perception module which is based on a multi-task learning neural network first takes a driver-view image as its input and predicts the track features. The control module which is based on reinforcement learning then makes a control decision based on these features. In order to improve the data efficiency, we propose visual TORCS (VTORCS), a deep reinforcement learning environment which is based on the open racing car simulator (TORCS). By means of the provided functions, one can train an agent with the input of an image or various physical sensor measurement, or evaluate the perception algorithm on this simulator. The trained reinforcement learning controller outperforms the linear quadratic regulator (LQR) controller and model predictive control (MPC) controller on different tracks. The experiments demonstrate that the perception module shows promising performance and the controller is capable of controlling the vehicle drive well along the track center with visual input. N recent years, artificial intelligence (AI) has flourished in many fields such as autonomous driving [1] [2], games [3] [4], and engineering applications [5] [6]. As one of the most popular topics, autonomous driving has drawn great attention both from the academic and industrial communities and is thought to be the next revolution in the intelligent transportation system. The autonomous driving system mainly consists of four modules: an environment perception module, a trajectory planning module, a control module, and an actuator mechanism module. The initial perception methods [7] [8] are based on the expensive LIDARs which usually cost tens of thousands of dollars. The high cost limits their large-scale applications to the ordinary vehicles. Recently, more attention is paid to the image-based methods [9] of which the core sensor, i.e. camera is relatively cheap and already equipped on most vehicles. Some of these perception methods have been developed into products [10] [11]. In this paper, we focus on the lateral control problem based on the image captured by the onboard camera.


Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Off-policy learning is more unstable compared to on-policy learning in reinforcement learning (RL). One reason for the instability of off-policy learning is a discrepancy between the target ($\pi$) and behavior (b) policy distributions. The discrepancy between $\pi$ and b distributions can be alleviated by employing a smooth variant of the importance sampling (IS), such as the relative importance sampling (RIS). RIS has parameter $\beta\in[0, 1]$ which controls smoothness. To cope with instability, we present the first relative importance sampling-off-policy actor-critic (RIS-Off-PAC) model-free algorithms in RL. In our method, the network yields a target policy (the actor), a value function (the critic) assessing the current policy ($\pi$), and behavior policy. We use action value generated from the behavior policy to train our algorithm rather than from the target policy. We also use deep neural networks to train both actor and critic. We evaluated our algorithm on a number of Open AI Gym benchmark problems and demonstrate better or comparable performance to several state-of-the-art RL baselines.



Machine learning to optimize traffic and reduce pollution

#artificialintelligence

Applying artificial intelligence to self-driving cars to smooth traffic, reduce fuel consumption, and improve air quality predictions may sound like the stuff of science fiction, but researchers at the Department of Energy's Lawrence Berkeley National Laboratory (Berkeley Lab) have launched two research projects to do just that. In collaboration with UC Berkeley, Berkeley Lab scientists are using deep reinforcement learning, a computational tool for training controllers, to make transportation more sustainable. One project uses deep reinforcement learning to train autonomous vehicles to drive in ways to simultaneously improve traffic flow and reduce energy consumption. A second uses deep learning algorithms to analyze satellite images combined with traffic information from cell phones and data already being collected by environmental sensors to improve air quality predictions. "Thirty percent of energy use in the U.S. is to transport people and goods, and this energy consumption contributes to air pollution, including approximately half of all nitrogen oxide emissions, a precursor to particular matter and ozone – and black carbon (soot) emissions," said Tom Kirchstetter, director of Berkeley Lab's Energy Analysis and Environmental Impacts Division, an adjunct professor at UC Berkeley, and a member of the research team.


Learning to Teach with Dynamic Loss Functions

arXiv.org Artificial Intelligence

Teaching is critical to human society: it is with teaching that prospective students are educated and human civilization can be inherited and advanced. A good teacher not only provides his/her students with qualified teaching materials (e.g., textbooks), but also sets up appropriate learning objectives (e.g., course projects and exams) considering different situations of a student. When it comes to artificial intelligence, treating machine learning models as students, the loss functions that are optimized act as perfect counterparts of the learning objective set by the teacher. In this work, we explore the possibility of imitating human teaching behaviors by dynamically and automatically outputting appropriate loss functions to train machine learning models. Different from typical learning settings in which the loss function of a machine learning model is predefined and fixed, in our framework, the loss function of a machine learning model (we call it student) is defined by another machine learning model (we call it teacher). The ultimate goal of teacher model is cultivating the student to have better performance measured on development dataset. Towards that end, similar to human teaching, the teacher, a parametric model, dynamically outputs different loss functions that will be used and optimized by its student model at different training stages. We develop an efficient learning method for the teacher model that makes gradient based optimization possible, exempt of the ineffective solutions such as policy optimization. We name our method as "learning to teach with dynamic loss functions" (L2T-DLF for short). Extensive experiments on real world tasks including image classification and neural machine translation demonstrate that our method significantly improves the quality of various student models.


Assessing Generalization in Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Deep reinforcement learning (RL) has achieved breakthrough results on many tasks, but has been shown to be sensitive to system changes at test time. As a result, building deep RL agents that generalize has become an active research area. Our aim is to catalyze and streamline community-wide progress on this problem by providing the first benchmark and a common experimental protocol for investigating generalization in RL. Our benchmark contains a diverse set of environments and our evaluation methodology covers both in-distribution and out-of-distribution generalization. To provide a set of baselines for future research, we conduct a systematic evaluation of deep RL algorithms, including those that specifically tackle the problem of generalization.


Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation

arXiv.org Artificial Intelligence

We consider the off-policy estimation problem of estimating the expected reward of a target policy using samples collected by a different behavior policy. Importance sampling (IS) has been a key technique to derive (nearly) unbiased estimators, but is known to suffer from an excessively high variance in long-horizon problems. In the extreme case of in infinite-horizon problems, the variance of an IS-based estimator may even be unbounded. In this paper, we propose a new off-policy estimation method that applies IS directly on the stationary state-visitation distributions to avoid the exploding variance issue faced by existing estimators.Our key contribution is a novel approach to estimating the density ratio of two stationary distributions, with trajectories sampled from only the behavior distribution. We develop a mini-max loss function for the estimation problem, and derive a closed-form solution for the case of RKHS. We support our method with both theoretical and empirical analyses.