Reinforcement Learning
Model-Based Meta-Reinforcement Learning for Flight with Suspended Payloads
Belkhale, Suneel, Li, Rachel, Kahn, Gregory, McAllister, Rowan, Calandra, Roberto, Levine, Sergey
Transporting suspended payloads is challenging for autonomous aerial vehicles because the payload can cause significant and unpredictable changes to the robot's dynamics. These changes can lead to suboptimal flight performance or even catastrophic failure. Although adaptive control and learning-based methods can in principle adapt to changes in these hybrid robot-payload systems, rapid mid-flight adaptation to payloads that have a priori unknown physical properties remains an open problem. We propose a meta-learning approach that "learns how to learn" models of altered dynamics within seconds of post-connection flight data. Our experiments demonstrate that our online adaptation approach outperforms non-adaptive methods on a series of challenging suspended payload transportation tasks. Videos and other supplemental material are available on our website https://sites.google.com/view/meta-rl-for-flight
AutoEG: Automated Experience Grafting for Off-Policy Deep Reinforcement Learning
Lu, Keting, Zhang, Shiqi, Chen, Xiaoping
Deep reinforcement learning (RL) algorithms frequently require prohibitive interaction experience to ensure the quality of learned policies. The limitation is partly because the agent cannot learn much from the many low-quality trials in early learning phase, which results in low learning rate. Focusing on addressing this limitation, this paper makes a twofold contribution. First, we develop an algorithm, called Experience Grafting (EG), to enable RL agents to reorganize segments of the few high-quality trajectories from the experience pool to generate many synthetic trajectories while retaining the quality. Second, building on EG, we further develop an AutoEG agent that automatically learns to adjust the grafting-based learning strategy. Results collected from a set of six robotic control environments show that, in comparison to a standard deep RL algorithm (DDPG), AutoEG increases the speed of learning process by at least 30%.
Tactical Decision-Making in Autonomous Driving by Reinforcement Learning with Uncertainty Estimation
Hoel, Carl-Johan, Wolff, Krister, Laine, Leo
Reinforcement learning (RL) can be used to create a tactical decision-making agent for autonomous driving. However, previous approaches only output decisions and do not provide information about the agent's confidence in the recommended actions. This paper investigates how a Bayesian RL technique, based on an ensemble of neural networks with additional randomized prior functions (RPF), can be used to estimate the uncertainty of decisions in autonomous driving. A method for classifying whether or not an action should be considered safe is also introduced. The performance of the ensemble RPF method is evaluated by training an agent on a highway driving scenario. It is shown that the trained agent can estimate the uncertainty of its decisions and indicate an unacceptable level when the agent faces a situation that is far from the training distribution. Furthermore, within the training distribution, the ensemble RPF agent outperforms a standard Deep Q-Network agent. In this study, the estimated uncertainty is used to choose safe actions in unknown situations. However, the uncertainty information could also be used to identify situations that should be added to the training process.
Cooperative Perception with Deep Reinforcement Learning for Connected Vehicles
Aoki, Shunsuke, Higuchi, Takamasa, Altintas, Onur
Sensor-based perception on vehicles are becoming prevalent and important to enhance the road safety. Autonomous driving systems use cameras, LiDAR, and radar to detect surrounding objects, while human-driven vehicles use them to assist the driver. However, the environmental perception by individual vehicles has the limitations on coverage and/or detection accuracy. For example, a vehicle cannot detect objects occluded by other moving/static obstacles. In this paper, we present a cooperative perception scheme with deep reinforcement learning to enhance the detection accuracy for the surrounding objects. By using the deep reinforcement learning to select the data to transmit, our scheme mitigates the network load in vehicular communication networks and enhances the communication reliability. To design, test, and verify the cooperative perception scheme, we develop a Cooperative & Intelligent Vehicle Simulation (CIVS) Platform, which integrates three software components: traffic simulator, vehicle simulator, and object classifier. We evaluate that our scheme decreases packet loss and thereby increases the detection accuracy by up to 12%, compared to the baseline protocol.
Flexible and Efficient Long-Range Planning Through Curious Exploration
Curtis, Aidan, Xin, Minjian, Arumugam, Dilip, Feigelis, Kevin, Yamins, Daniel
Identifying algorithms that flexibly and efficiently discover temporally-extended multi-phase plans is an essential step for the advancement of robotics and model-based reinforcement learning. The core problem of long-range planning is finding an efficient way to search through the tree of possible action sequences. Existing non-learned planning solutions from the Task and Motion Planning (TAMP) literature rely on the existence of logical descriptions for the effects and preconditions for actions. This constraint allows TAMP methods to efficiently reduce the tree search problem but limits their ability to generalize to unseen and complex physical environments. In contrast, deep reinforcement learning (DRL) methods use flexible neural-network-based function approximators to discover policies that generalize naturally to unseen circumstances. However, DRL methods struggle to handle the very sparse reward landscapes inherent to long-range multi-step planning situations. Here, we propose the Curious Sample Planner (CSP), which fuses elements of TAMP and DRL by combining a curiosity-guided sampling strategy with imitation learning to accelerate planning. We show that CSP can efficiently discover interesting and complex temporally-extended plans for solving a wide range of physically realistic 3D tasks. In contrast, standard planning and learning methods often fail to solve these tasks at all or do so only with a huge and highly variable number of training samples. We explore the use of a variety of curiosity metrics with CSP and analyze the types of solutions that CSP discovers. Finally, we show that CSP supports task transfer so that the exploration policies learned during experience with one task can help improve efficiency on related tasks.
Combined Model for Partially-Observable and Non-Observable Task Switching: Solving Hierarchical Reinforcement Learning Problems Statically and Dynamically with Transfer Learning
Khan, Nibraas, Phillips, Joshua
An integral function of fully autonomous robots and humans is the ability to focus attention on a few relevant percepts to reach a certain goal while disregarding irrelevant percepts. Humans and animals rely on the interactions between the Pre-Frontal Cortex (PFC) and the Basal Ganglia (BG) to achieve this focus called Working Memory (WM). The Working Memory Toolkit (WMtk) was developed based on a computational neuroscience model of this phenomenon with Temporal Difference (TD) Learning for autonomous systems. Recent adaptations of the toolkit either utilize Abstract Task Representations (ATRs) to solve Non-Observable (NO) tasks or storage of past input features to solve Partially-Observable (PO) tasks, but not both. We propose a new model, PONOWMtk, which combines both approaches, ATRs and input storage, with a static or dynamic number of ATRs. The results of our experiments show that PONOWMtk performs effectively for tasks that exhibit PO, NO, or both properties.
Artificial Intelligence: Reinforcement Learning in Python
Free Coupon Discount - Artificial Intelligence: Reinforcement Learning in Python, Complete guide to Artificial Intelligence, prep for Deep Reinforcement Learning with Stock Trading Applications Created by Lazy Programmer Inc. Students also bought Data Science: Deep Learning in Python Recommender Systems and Deep Learning in Python PyTorch: Deep Learning and Artificial Intelligence Advanced AI: Deep Reinforcement Learning in Python Deep Learning Prerequisites: Logistic Regression in Python Preview this Udemy Course GET COUPON CODE Description When people talk about artificial intelligence, they usually don't mean supervised and unsupervised machine learning. These tasks are pretty trivial compared to what we think of AIs doing - playing chess and Go, driving cars, and beating video games at a superhuman level. Reinforcement learning has recently become popular for doing all of that and more. Much like deep learning, a lot of the theory was discovered in the 70s and 80s but it hasn't been until recently that we've been able to observe first hand the amazing results that are possible. In 2016 we saw Google's AlphaGo beat the world Champion in Go.
Learning from humans: what is inverse reinforcement learning?
One of the goals of AI research is to teach machines how to do the same things people do, but better. In the early 2000s, this meant focusing on problems like flying helicopters and walking up flights of stairs. However, there's still a massive list of problems where humans outperform machines. Although we can no longer claim to beat machines at tasks like Go and image classification, we have a distinct advantage in solving problems that aren't as well-defined, like judging a well-executed backflip, cleaning a room while preventing accidents, and perhaps the most human problem of all: reasoning about people's values. Since all these tasks contain some degree of subjectivity, machines need information about the world as well as a way to learn about the people within it in order to solve these problems.
How to fix reinforcement learning
"Value functions are a core component of [RL] systems. The main idea is to to construct a single function approximator V(s; θ) that estimates the long-term reward from any state s, using parameters θ. In this paper we introduce universal value function approximators (UVFAs) V(s, g; θ) that generalise not just over states s but also over goals g." Here is a rigorous, mathematical formulation of RL that treats goals (the high-level objective of the skill to be learned, which should yield good rewards) as a fundamental and necessary input rather than something to be discovered from just the reward signal. The agent is told what it's supposed to do, just as is done in zero-shot learning and actual human learning. It has been 3 years since this was published, and how many papers have cited it since?