Goto

Collaborating Authors

 Reinforcement Learning


Deep Reinforcement Learning in HOL4

arXiv.org Artificial Intelligence

The paper describes an implementation of deep reinforcement learning through self-supervised learning within the proof assistant HOL4. A close interaction between the machine learning modules and the HOL4 library is achieved by the choice of tree neural networks (TNNs) as machine learning models and the internal use of HOL4 terms to represent tree structures of TNNs. Recursive improvement is possible when a given task is expressed as a search problem. In this case, a Monte Carlo Tree Search (MCTS) algorithm guided by a TNN can be used to explore the search space and produce better examples for training the next TNN. As an illustration, tasks over propositional and arithmetical terms, representative of fundamental theorem proving techniques, are specified and learned: truth estimation, end-to-end computation, term rewriting and term synthesis.


High-Confidence Policy Optimization: Reshaping Ambiguity Sets in Robust MDPs

arXiv.org Artificial Intelligence

Robust MDPs are a promising framework for computing robust policies in reinforcement learning. Ambiguity sets, which represent the plausible errors in transition probabilities, determine the trade-off between robustness and average-case performance. The standard practice of defining ambiguity sets using the $L_1$ norm leads, unfortunately, to loose and impractical guarantees. This paper describes new methods for optimizing the shape of ambiguity sets beyond the $L_1$ norm. We derive new high-confidence sampling bounds for weighted $L_1$ and weighted $L_\infty$ ambiguity sets and describe how to compute near-optimal weights from rough value function estimates. Experimental results on a diverse set of benchmarks show that optimized ambiguity sets provide significantly tighter robustness guarantees.


Reinforcement Learning -- Generalisation on Continuing Tasks

#artificialintelligence

Till now we have been through many reinforcement learning examples, from on-policy to off-policy, discrete state space to continuous state space. All these examples vary in some way, but you might have noticed that they have at least one shared trait -- Episodic, that is all of which have a clear starting point and ending point, and whenever an agent reaches the goal, it starts over again and again until reaching certain number of loops. In this article, we will extend the idea to non-episodic task, that is task which has no clear ending point and the agent goes on forever in the environment setting. The main concept that will be applied to non-episodic task is average reward. The average reward setting also applies to continuing problems, problems for which the interaction between agent and environment goes on and on forever without termination or start states.



Rationally Inattentive Inverse Reinforcement Learning Explains YouTube Commenting Behavior

arXiv.org Machine Learning

We consider a novel application of inverse reinforcement learning which involves modeling, learning and predicting the commenting behavior of YouTube viewers. Each group of users is modeled as a rationally inattentive Bayesian agent. Our methodology integrates three key components. First, to identify distinct commenting patterns, we use deep embedded clustering to estimate framing information (essential extrinsic features) that clusters users into distinct groups. Second, we present an inverse reinforcement learning algorithm that uses Bayesian revealed preferences to test for rationality: does there exist a utility function that rationalizes the given data, and if yes, can it be used to predict future behavior? Finally, we impose behavioral economics constraints stemming from rational inattention to characterize the attention span of groups of users.The test imposes a R{\'e}nyi mutual information cost constraint which impacts how the agent can select attention strategies to maximize their expected utility. After a careful analysis of a massive YouTube dataset, our surprising result is that in most YouTube user groups, the commenting behavior is consistent with optimizing a Bayesian utility with rationally inattentive constraints. The paper also highlights how the rational inattention model can accurately predict future commenting behavior. The massive YouTube dataset and analysis used in this paper are available on GitHub and completely reproducible.


MAMPS: Safe Multi-Agent Reinforcement Learning via Model Predictive Shielding

arXiv.org Artificial Intelligence

Reinforcement learning is a promising approach to learning control policies for performing complex multi-agent robotics tasks. However, a policy learned in simulation often fails to guarantee even simple safety properties such as obstacle avoidance. To ensure safety, we propose multi-agent model predictive shielding (MAMPS), an algorithm that provably guarantees safety for an arbitrary learned policy. In particular, it operates by using the learned policy as often as possible, but instead uses a backup policy in cases where it cannot guarantee the safety of the learned policy. Using a multi-agent simulation environment, we show how MAMPS can achieve good performance while ensuring safety.


Collision Avoidance in Pedestrian-Rich Environments with Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Collision avoidance algorithms are essential for safe and efficient robot operation among pedestrians. This work proposes using deep reinforcement (RL) learning as a framework to model the complex interactions and cooperation with nearby, decision-making agents (e.g., pedestrians, other robots). Existing RL-based works assume homogeneity of agent policies, use specific motion models over short timescales, or lack a mechanism to consider measurements taken with a large number (possibly varying) of nearby agents. Therefore, this work develops an algorithm that learns collision avoidance among a variety of types of non-communicating, dynamic agents without assuming they follow any particular behavior rules. It extends our previous work by introducing a strategy using Long Short-Term Memory (LSTM) that enables the algorithm to use observations of an arbitrary number of other agents, instead of a small, fixed number of neighbors. The proposed algorithm is shown to outperform a classical collision avoidance algorithm, another deep RL-based algorithm, and scales with the number of agents better (fewer collisions, shorter time to goal) than our previously published learning-based approach. Analysis of the LSTM provides insights into how observations of nearby agents affect the hidden state and quantifies the performance impact of various agent ordering heuristics. The learned policy generalizes to several applications beyond the training scenarios: formation control (arrangement into letters), an implementation on a fleet of four multirotors, and an implementation on a fully autonomous robotic vehicle capable of traveling at human walking speed among pedestrians.


Giving purpose to AI: Deep reinforcement learning

#artificialintelligence

Yet many of the applications we've seen are single-event driven. Some examples: Is the image shown that of a cat? Given a word, translate it into English. Execute a given command, such as "Turn on the Light." Deep learning techniques have been responsible for many AI applications like these, but fundamentally, deep learning is task-oriented.


Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

arXiv.org Artificial Intelligence

Meta-World: A Benchmark and Evaluation for Multi-T ask and Meta Reinforcement Learning Tianhe Y u 1, Deirdre Quillen 2, Zhanpeng He 3, Ryan Julian 4, Karol Hausman 5, Chelsea Finn 1, Sergey Levine 2 Stanford University 1, UC Berkeley 2, Columbia University 3, University of Southern California 4, Robotics at Google 5 Abstract: Meta-reinforcement learning algorithms can enable robots to acquire new skills much more quickly, by leveraging prior experience to learn how to learn. However, much of the current research on meta-reinforcement learning focuses on task distributions that are very narrow. For example, a commonly used meta-reinforcement learning benchmark uses different running velocities for a simulated robot as different tasks. When policies are meta-trained on such narrow task distributions, they cannot possibly generalize to more quickly acquire entirely new tasks. Therefore, if the aim of these methods is to enable faster acquisition of entirely new behaviors, we must evaluate them on task distributions that are sufficiently broad to enable generalization to new behaviors. In this paper, we propose an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks. Our aim is to make it possible to develop algorithms that generalize to accelerate the acquisition of entirely new, held-out tasks. We evaluate 6 state-of-the-art meta-reinforcement learning and multi-task learning algorithms on these tasks. Surprisingly, while each task and its variations (e.g., with different object positions) can be learned with reasonable success, these algorithms struggle to learn with multiple tasks at the same time, even with as few as ten distinct training tasks. Our analysis and open-source environments pave the way for future research in multi-task learning and meta-learning that can enable meaningful generalization, thereby unlocking the full potential of these methods. 1 . Keywords: meta-learning, multi-task reinforcement learning, benchmarks 1 Introduction While reinforcement learning (RL) has achieved some success in domains such as assembly [1], ping pong [2], in-hand manipulation [3], and hockey [4], state-of-the-art methods require substantially more experience than humans to acquire only one narrowly-defined skill. If we want robots to be broadly useful in realistic environments, we instead need algorithms that can learn a wide variety of skills reliably and efficiently.


Learning Q-network for Active Information Acquisition

arXiv.org Machine Learning

In this paper, we propose a novel Reinforcement Learning approach for solving the Active Information Acquisition problem, which requires an agent to choose a sequence of actions in order to acquire information about a process of interest using on-board sensors. The classic challenges in the information acquisition problem are the dependence of a planning algorithm on known models and the difficulty of computing information-theoretic cost functions over arbitrary distributions. In contrast, the proposed framework of reinforcement learning does not require any knowledge on models and alleviates the problems during an extended training stage. It results in policies that are efficient to execute online and applicable for real-time control of robotic systems. Furthermore, the state-of-the-art planning methods are typically restricted to short horizons, which may become problematic with local minima. Reinforcement learning naturally handles the issue of planning horizon in information problems as it maximizes a discounted sum of rewards over a long finite or infinite time horizon. We discuss the potential benefits of the proposed framework and compare the performance of the novel algorithm to an existing information acquisition method for multi-target tracking scenarios.