Reinforcement Learning
Where to Look Next: Unsupervised Active Visual Exploration on 360{\deg} Input
Seifi, Soroush, Tuytelaars, Tinne
We address the problem of active visual exploration of large 360{\deg} inputs. In our setting an active agent with a limited camera bandwidth explores its 360{\deg} environment by changing its viewing direction at limited discrete time steps. As such, it observes the world as a sequence of narrow field-of-view 'glimpses', deciding for itself where to look next. Our proposed method exceeds previous works' performance by a significant margin without the need for deep reinforcement learning or training separate networks as sidekicks. A key component of our system are the spatial memory maps that make the system aware of the glimpses' orientations (locations in the 360{\deg} image). Further, we stress the advantages of retina-like glimpses when the agent's sensor bandwidth and time-steps are limited. Finally, we use our trained model to do classification of the whole scene using only the information observed in the glimpses.
Hands-On Meta Learning with Python: Meta learning using one-shot learning, MAML, Reptile, and Meta-SGD with TensorFlow: Sudharsan Ravichandiran: 9781789534207: Amazon.com: Books
Sudharsan Ravichandiran is a data scientist, researcher, artificial intelligence enthusiast, and YouTuber (search for Sudharsan reinforcement learning). He completed his bachelors in information technology at Anna University. His area of research focuses on practical implementations of deep learning and reinforcement learning, which includes natural language processing and computer vision. He is an open source contributor and loves answering questions on Stack Overflow. He also authored a best seller, Hands-On Reinforcement Learning with Python, published by Packt Publishing.
deepmind/bsuite
This library automates evaluation and analysis of any agent on these benchmarks. It serves to facilitate reproducible, and accessible, research on the core issues in RL, and ultimately the design of superior learning algorithms. Going forward, we hope to incorporate more excellent experiments from the research community, and commit to a periodic review of the experiments from a committee of prominent researchers. For a more comprehensive overview, see the accompanying paper. This means any experiment will automatically output data in the correct format for analysis using the notebook, without any constraints on the structure of agents or algorithms.
An introduction to reinforcement learning with AWS RoboMaker Amazon Web Services
Robotics often involves training complex sequences of behaviors. For example, consider a robot designed to follow or track another object. Although the goal is easy to describe (the closer the robot is to the object, the better), creating the logic that accomplishes the task is much more difficult. Reinforcement learning (RL), an emerging Machine Learning technique, can help develop solutions for exactly these kinds of problems. This post is an introduction to RL and it explains how we used AWS RoboMaker to develop an application that trains a TurtleBot Waffle Pi to track and move toward a TurtleBot Burger.
Faster saddle-point optimization for solving large-scale Markov decision processes
Bas-Serrano, Joan, Neu, Gergely
We consider the problem of computing optimal policies in average-r eward Markov decision processes. This classical problem can be formulated as a linear program dire ctly amenable to saddle-point optimization methods, albeit with a number of variables that is linear in the n umber of states. T o address this issue, recent work has considered a linearly relaxed version of the res ulting saddle-point problem. Our work aims at achieving a better understanding of this relaxed optimization pro blem by characterizing the conditions necessary for convergence to the optimal policy, and designing a n optimization algorithm enjoying fast convergence rates that are independent of the size of the state s pace.
Multi-task Learning and Catastrophic Forgetting in Continual Reinforcement Learning
Ribeiro, Joรฃo, Melo, Francisco S., Dias, Joรฃo
In this paper we investigate two hypothesis regarding the use of deep reinforcement learning in multiple tasks. The first hypothesis is driven by the question of whether a deep reinforcement learning algorithm, trained on two similar tasks, is able to outperform two single-task, individually trained algorithms, by more efficiently learning a new, similar task, that none of the three algorithms has encountered before. The second hypothesis is driven by the question of whether the same multi-task deep RL algorithm, trained on two similar tasks and augmented with elastic weight consolidation (EWC), is able to retain similar performance on the new task, as a similar algorithm without EWC, whilst being able to overcome catastrophic forgetting in the two previous tasks. We show that a multi-task Asynchronous Advantage Actor-Critic (GA3C) algorithm, trained on Space Invaders and Demon Attack, is in fact able to outperform two single-tasks GA3C versions, trained individually for each single-task, when evaluated on a new, third task, namely, Phoenix. We also show that, when training two trained multi-task GA3C algorithms on the third task, if one is augmented with EWC, it is not only able to achieve similar performance on the new task, but also capable of overcoming a substantial amount of catastrophic forgetting on the two previous tasks.
AI Learns to Cheat at Hide and Seek #OpenAI #HideandSeek #MachineLearning #ArtificialIntelligence #ReinforcementLearning @OpenAI
OpenAI recently posted on Twitter about teaching computer agents'hide and seek'. We've observed AIs discovering complex tool use while competing in a simple game of hide-and-seek. They develop a series of six distinct strategies and counter strategies, ultimately using tools in the environment to break our simulated physics. In the simulations, seekers are incentivized to maintain line of sight of hiders and hiders are incentivized to avoid line of sight from seekers. The agents environments contain various shelters including cubicles, movable partitions, blocks and ramps. That said, there is no built-in incentive for agents to interact with objects around them.
Scaled Autonomy: Enabling Human Operators to Control Robot Fleets
Swamy, Gokul, Reddy, Siddharth, Levine, Sergey, Dragan, Anca D.
Autonomous robots often encounter challenging situations where their control policies fail and an expert human operator must briefly intervene, e.g., through teleoperation. In settings where multiple robots act in separate environments, a single human operator can manage a fleet of robots by identifying and teleoperating one robot at any given time. The key challenge is that users have limited attention: as the number of robots increases, users lose the ability to decide which robot requires teleoperation the most. Our goal is to automate this decision, thereby enabling users to supervise more robots than their attention would normally allow for. Our insight is that we can model the user's choice of which robot to control as an approximately optimal decision that maximizes the user's utility function. We learn a model of the user's preferences from observations of the user's choices in easy settings with a few robots, and use it in challenging settings with more robots to automatically identify which robot the user would most likely choose to control, if they were able to evaluate the states of all robots at all times. We run simulation experiments and a user study with twelve participants that show our method can be used to assist users in performing a navigation task and manipulator reaching task.
Deep Reinforcement Learning with Modulated Hebbian plus Q Network Architecture
Ladosz, Pawel, Ben-Iwhiwhu, Eseoghene, Hu, Yang, Ketz, Nicholas, Kolouri, Soheil, Krichmar, Jeffrey L., Pilly, Praveen, Soltoggio, Andrea
This paper introduces the modulated Hebbian plus Q network architecture (MOHQA) for solving challenging partially observable Markov decision processes (POMDPs) deep reinforcement learning problems with sparse rewards and confounding observations. The proposed architecture combines a deep Q-network (DQN), and a modulated Hebbian network with neural eligibility traces (MOHN). Bio-inspired neural traces are used to bridge temporal delays between actions and rewards. The purpose is to discover distal cause-effect relationships where confounding observations and sparse rewards cause standard RL algorithms to fail. Each of the two modules of the network (DQN and MOHN) is responsible for different aspects of learning. DQN learns low level features and control, while MOHN contributes to the high-level decisions by bridging rewards with past actions. The strength of the approach is to support a DQN standard framework when temporal difference errors are difficult to compute due to non-observable states. The system is tested on a set of generalized decision making problems encoded as decision tree graphs that deliver delayed rewards after key decision points and confounding observations. The simulations show that the proposed approach helps solve problems that are currently challenging for state-of-the-art deep reinforcement learning algorithms.
Leveraging Human Guidance for Deep Reinforcement Learning Tasks
Zhang, Ruohan, Torabi, Faraz, Guan, Lin, Ballard, Dana H., Stone, Peter
Reinforcement learning agents can learn to solve sequential decision tasks by interacting with the environment. Human knowledge of how to solve these tasks can be incorporated using imitation learning, where the agent learns to imitate human demonstrated decisions. However, human guidance is not limited to the demonstrations. Other types of guidance could be more suitable for certain tasks and require less human effort. This survey provides a high-level overview of five recent learning frameworks that primarily rely on human guidance other than conventional, step-by-step action demonstrations. We review the motivation, assumption, and implementation of each framework. We then discuss possible future research directions.