Goto

Collaborating Authors

 Reinforcement Learning


Learning Transferable Domain Priors for Safe Exploration in Reinforcement Learning

arXiv.org Artificial Intelligence

Prior access to domain knowledge could significantly improve the performance of a reinforcement learning agent. In particular, it could help agents avoid potentially catastrophic exploratory actions, which would otherwise have to be experienced during learning. In this work, we identify consistently undesirable actions in a set of previously learned tasks, and use pseudo-rewards associated with them to learn a prior policy. In addition to enabling safe exploratory behaviors in subsequent tasks in the domain, these priors are transferable to similar environments, and can be learned off-policy and in parallel with the learning of other tasks in the domain. We compare our approach to established, state-of-the-art algorithms in a grid-world navigation environment, and demonstrate that it exhibits a superior performance with respect to avoiding unsafe actions while learning to perform arbitrary tasks in the domain. We also present some theoretical analysis to support these results, and discuss the implications and some alternative formulations of this approach, which could also be useful to accelerate learning in certain scenarios.


A Survey on Reproducibility by Evaluating Deep Reinforcement Learning Algorithms on Real-World Robots

arXiv.org Artificial Intelligence

As reinforcement learning (RL) achieves more success in solving complex tasks, more care is needed to ensure that RL research is reproducible and that algorithms herein can be compared easily and fairly with minimal bias. RL results are, however, notoriously hard to reproduce due to the algorithms' intrinsic variance, the environments' stochasticity, and numerous (potentially unreported) hyper-parameters. In this work we investigate the many issues leading to irreproducible research and how to manage those. We further show how to utilise a rigorous and standardised evaluation approach for easing the process of documentation, evaluation and fair comparison of different algorithms, where we emphasise the importance of choosing the right measurement metrics and conducting proper statistics on the results, for unbiased reporting of the results.


What is Deep Reinforcement Learning? โ€“ Data Smarts

#artificialintelligence

Last week talked about Reinforcement Learning, how it's been used in real-world applications today some of the components and trade-off we ought to make when we program an agent to learn from its environment. You can check the post here. Today's post will be a short one as we focus only on the "DEEP" part of Deep Reinforcement Learning. As you might have already guessed, Deep Reinforcement Learning is just a variant of Reinforcement Learning, so everything we learn in previous Wednesday's post clearly holds and applies. However, I want to shed some light on the differences between DRL and traditional RL, so I think you'll find this article quite useful.


AWS DeepRacer - the fastest way to get rolling with machine learning

#artificialintelligence

AWS DeepRacer is a 1/18th scale race car which gives you an interesting and fun way to get started with reinforcement learning (RL). RL is an advanced machine learning (ML) technique which takes a very different approach to training models than other machine learning methods. Its super power is that it learns very complex behaviors without requiring any labeled training data, and can make short term decisions while optimizing for a longer term goal. With AWS DeepRacer, you now have a way to get hands-on with RL, experiment, and learn through autonomous driving. You can get started with the virtual car and tracks in the cloud-based 3D racing simulator, and for a real-world experience, you can deploy your trained models onto AWS DeepRacer and race your friends, or take part in the global AWS DeepRacer League.


Deep Reinforcement Learning Algorithm for Dynamic Pricing of Express Lanes with Multiple Access Locations

arXiv.org Artificial Intelligence

This article develops a deep reinforcement learning (Deep-RL) framework for dynamic pricing on managed lanes with multiple access locations and heterogeneity in travelers' value of time, origin, and destination. This framework relaxes assumptions in the literature by considering multiple origins and destinations, multiple access locations to the managed lane, en route diversion of travelers, partial observability of the sensor readings, and stochastic demand and observations. The problem is formulated as a partially observable Markov decision process (POMDP) and policy gradient methods are used to determine tolls as a function of real-time observations. Tolls are modeled as continuous and stochastic variables, and are determined using a feedforward neural network. The method is compared against a feedback control method used for dynamic pricing. We show that Deep-RL is effective in learning toll policies for maximizing revenue, minimizing total system travel time, and other joint weighted objectives, when tested on real-world transportation networks. The Deep-RL toll policies outperform the feedback control heuristic for the revenue maximization objective by generating revenues up to 9.5% higher than the heuristic and for the objective minimizing total system travel time (TSTT) by generating TSTT up to 10.4% lower than the heuristic. We also propose reward shaping methods for the POMDP to overcome the undesired behavior of toll policies, like the jam-and-harvest behavior of revenue-maximizing policies. Additionally, we test transferability of the algorithm trained on one set of inputs for new input distributions and offer recommendations on real-time implementations of Deep-RL algorithms. The source code for our experiments is available online at https://github.com/venktesh22/ExpressLanes_Deep-RL


Q-Learning Based Aerial Base Station Placement for Fairness Enhancement in Mobile Networks

arXiv.org Machine Learning

In this paper, we use an aerial base station (aerial-BS) to enhance fairness in a dynamic environment with user mobility. The problem of optimally placing the aerial-BS is a non-deterministic polynomial-time hard (NP-hard) problem. Moreover, the network topology is subject to continuous changes due to the user mobility. These issues intensify the quest to develop an adaptive and fast algorithm for 3D placement of the aerial-BS. To this end, we propose a method based on reinforcement learning to achieve these goals. Simulation results show that our method increases fairness among users in a reasonable computing time, while the solution is comparatively close to the optimal solution obtained by exhaustive search.


Reinforcement Learning and Video Games

arXiv.org Machine Learning

As one part of them, Reinforcement Learning has achieved incredible results in game playing. An intelligent agent will be created and trained with reinforcement learning algorithms to fulfill this tasks. In the Future of Go Summit 2017, Alpha Go which is an AI player trained with deep reinforcement learning algorithms won three games against the world best human player in Go. The success of reinforcement learning in this area shock the world and many researches are launched such as driverless cars. Deep learning methods such as convolutional neural network contributes a lot to this because these techniques solves the problem of dealing with high dimension input data and feature extraction. T-rex Runner is a dinosaur game from Google Chrome offline mode. The aim of the player is to escape all obstacles and get higher score until reaching the limitation which is 99999. The moving speed of the obstacles will increase as time goes by which make it difficult to get the highest score. The code of this project can be found in this link which is written in Python.


A Multistep Lyapunov Approach for Finite-Time Analysis of Biased Stochastic Approximation

arXiv.org Machine Learning

Motivated by the widespread use of temporal-difference (TD-) and Q-learning algorithms in reinforcement learning, this paper studies a class of biased stochastic approximation (SA) procedures under a mild "ergodic-like" assumption on the underlying stochastic noise sequence. Building upon a carefully designed multistep Lyapunov function that looks ahead to several future updates to accommodate the stochastic perturbations (for control of the gradient bias), we prove a general result on the convergence of the iterates, and use it to derive non-asymptotic bounds on the mean-square error in the case of constant stepsizes. This novel looking-ahead viewpoint renders finite-time analysis of biased SA algorithms under a large family of stochastic perturbations possible. For direct comparison with existing contributions, we also demonstrate these bounds by applying them to TD- and Q-learning with linear function approximation, under the practical Markov chain observation model. The resultant finite-time error bound for both the TD- as well as the Q-learning algorithms is the first of its kind, in the sense that it holds i) for the unmodified versions (i.e., without making any modifications to the parameter updates) using even nonlinear function approximators; as well as for Markov chains ii) under general mixing conditions and iii) starting from any initial distribution, at least one of which has to be violated for existing results to be applicable.


High efficiency rl agent

arXiv.org Artificial Intelligence

Now a day, model free algorithm achieve state of art performance on many RL problems, but the low efficiency of model free algorithm limited the usage. We combine model base RL, soft actor-critic framework, and curiosity. proposed an agent called RMC, giving a promise way to achieve good performance while maintain data efficiency. We suppress the performance of SAC and achieve state of the art performance, both on efficiency and stability. Meanwhile we can solving POMDP problem and achieve great generalization from MDP to POMDP.


A Deep Learning Approach to Grasping the Invisible

arXiv.org Artificial Intelligence

Y ang Y ang 1, Hengyue Liang 2 and Changhyun Choi 2 Abstract -- We introduce a new problem named "grasping the invisible", where a robot is tasked to grasp an initially invisible target object via a sequence of nonprehensile (e.g., pushing) and prehensile (e.g., grasping) actions. In this problem, nonprehensile actions are needed to search for the target and rearrange cluttered objects around it. We propose to solve the problem by formulating a deep reinforcement learning approach in an actor-critic format. A critic that maps both the visual observations and the target information to expected rewards of actions is learned via deep Q-learning for instance pushing and grasping. Two actors are proposed to take in the critic predictions and the domain knowledge for two subtasks: a Bayesian-based actor accounting for past experience performs explorational pushing to search for the target; once the target is found, a classifier-based actor coordinates the target-oriented pushing and grasping to grasp the target in clutter . The model is entirely self-supervised through the robot-environment interactions. Our system achieves 93% and 87% task success rate on the two subtasks in simulation and 85% task success rate in real robot experiments, which outperforms several baselines by large margins. Supplementary material is available at: https://sites.google.com/umn.edu/grasping-invisible. Index T erms -- Dexterous Manipulation, Deep Learning in Robotics and Automation, Computer Vision for Automation I. INTRODUCTION Imagine what happens when a young kid is looking for a specific toy block buried in clutter, as shown in Figure 1a. He or she may first push down the pile of the blocks and luckily spot the target block in clutter, then push around it to make a space for the fingers (we refer to this type of motion as "singulation" [1]) and finally grasp it. We have wondered if an intelligent agent can perform such a task.