AITopics

2005.00527

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry: Education > Focused Education > Special Education (0.44)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceJul-8-2020, 17:13:31 GMT

Reverb: a framework for experience replay

The use of experience plays a key role in reinforcement learning (RL). How best to use this data is one of the central problems of this field. As RL agents have advanced over recent years, taking on bigger and more complex problems (Atari, Go, StarCraft, Dota), the generated data has grown in both size and complexity. To cope with this complexity many RL systems split the learning problem into two distinct parts: experience producers (actors) and experience consumers (learners) — allowing these different parts to run in parallel. Often a data storage system lies at the intersection between these two components. The question of how to efficiently store and transport the data is itself a challenging engineering problem.

deep learning, experience replay, reinforcement learning, (3 more...)

#artificialintelligence

Industry: Education > Focused Education > Special Education (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Gorodetskiy, Andrey, Shlychkova, Alexandra, Panov, Aleksandr I.

Delta Schema Network in Model-based Reinforcement Learning

This work is devoted to unresolved problems of Artificial General Intelligence - the inefficiency of transfer learning. One of the mechanisms that are used to solve this problem in the area of reinforcement learning is a model-based approach. In the paper we are expanding the schema networks method which allows to extract the logical relationships between objects and actions from the environment data. We present algorithms for training a Delta Schema Network (DSN), predicting future states of the environment and planning actions that will lead to positive reward. DSN shows strong performance of transfer learning on the classic Atari game environment.

machine learning, node, reinforcement learning, (18 more...)

2006.0995

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.05)
Asia > Russia (0.05)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)

Auto-MAP: A DQN Framework for Exploring Distributed Execution Plans for DNN Workloads

Wang, Siyu, Rong, Yi, Fan, Shiqing, Zheng, Zhen, Diao, LanSong, Long, Guoping, Yang, Jun, Liu, Xiaoyong, Lin, Wei

The last decade has witnessed growth in the computational requirements for training deep neural networks. Current approaches (e.g., data/model parallelism, pipeline parallelism) parallelize training tasks onto multiple devices. However, these approaches always rely on specific deep learning frameworks and requires elaborate manual design, which make it difficult to maintain and share between different type of models. In this paper, we propose Auto-MAP, a framework for exploring distributed execution plans for DNN workloads, which can automatically discovering fast parallelization strategies through reinforcement learning on IR level of deep learning models. Efficient exploration remains a major challenge for reinforcement learning. We leverage DQN with task-specific pruning strategies to help efficiently explore the search space including optimized strategies. Our evaluation shows that Auto-MAP can find the optimal solution in two hours, while achieving better throughput on several NLP and convolution models.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2007.04069

Country: Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre:

Research Report (1.00)
Workflow (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Stooke, Adam, Achiam, Joshua, Abbeel, Pieter

Responsive Safety in Reinforcement Learning by PID Lagrangian Methods

Lagrangian methods are widely used algorithms for constrained optimization problems, but their learning dynamics exhibit oscillations and overshoot which, when applied to safe reinforcement learning, leads to constraint-violating behavior during agent training. We address this shortcoming by proposing a novel Lagrange multiplier update method that utilizes derivatives of the constraint function. We take a controls perspective, wherein the traditional Lagrange multiplier update behaves as \emph{integral} control; our terms introduce \emph{proportional} and \emph{derivative} control, achieving favorable learning dynamics through damping and predictive measures. We apply our PID Lagrangian methods in deep RL, setting a new state of the art in Safety Gym, a safe RL benchmark. Lastly, we introduce a new method to ease controller tuning by providing invariance to the relative numerical scales of reward and cost. Our extensive experiments demonstrate improved performance and hyperparameter robustness, while our algorithms remain nearly as simple to derive and implement as the traditional Lagrangian approach.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2007.03964

Country:

Europe > Austria > Vienna (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Am I Building a White Box Agent or Interpreting a Black Box Agent?

Bewley, Tom

The rule extraction literature contains the notion of a fidelity-accuracy dilemma: when building an interpretable model of a black box function, optimising for fidelity is likely to reduce performance on the underlying task, and vice versa. I reassert the relevance of this dilemma for the modern field of explainable artificial intelligence, and highlight how it is compounded when the black box is an agent interacting with a dynamic environment. I then discuss two independent research directions - building white box agents and interpreting black box agents - which are both coherent and worthy of attention, but must not be conflated by researchers embarking on projects in the domain of agent interpretability.

black box, box agent, machine learning, (14 more...)

2007.01187

Country: Europe > United Kingdom > England > Bristol (0.04)

Genre: Research Report (0.40)

Industry: Transportation > Air (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Sun, Chuangchuang, Kim, Dong-Ki, How, Jonathan P.

Set-Invariant Constrained Reinforcement Learning with a Meta-Optimizer

This paper investigates reinforcement learning with constraints, which is indispensable in safety-critical environments. To drive the constraint violation monotonically decrease, the constraints are taken as Lyapunov functions, and new linear constraints are imposed on the updating dynamics of the policy parameters such that the original safety set is forward-invariant in expectation. As the new guaranteed-feasible constraints are imposed on the updating dynamics instead of the original policy parameters, classic optimization algorithms are no longer applicable. To address this, we propose to learn a neural network-based meta-optimizer to optimize the objective while satisfying such linear constraints. The constraint-satisfaction is achieved via projection onto a polytope formulated by multiple linear inequality constraints, which can be solved analytically with our newly designed metric. Ultimately, the meta-optimizer trains the policy network to monotonically decrease the constraint violation and maximize the cumulative reward. Numerical results validate the theoretical findings.

constraint, machine learning, reinforcement learning, (14 more...)

2006.11419

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Bacci, Edoardo, Parker, David

Probabilistic Guarantees for Safe Deep Reinforcement Learning

Deep reinforcement learning has been successfully applied to many control tasks, but the application of such controllers in safetycritical scenarios has been limited due to safety concerns. Rigorous testing of these controllers is challenging, particularly when they operate in probabilistic environments due to, for example, hardware faults or noisy sensors. We propose MOSAIC, an algorithm for measuring the safety of deep reinforcement learning controllers in stochastic settings. Our approach is based on the iterative construction of a formal abstraction of a controller's execution in an environment, and leverages probabilistic model checking of Markov decision processes to produce probabilistic guarantees on safe behaviour over a finite time horizon. It produces bounds on the probability of safe operation of the controller for different initial configurations and identifies regions where correct behaviour can be guaranteed. We implement and evaluate our approach on controllers trained for several benchmark control problems.

controller, machine learning, reinforcement learning, (20 more...)

2005.07073

Country: Europe > United Kingdom > England > West Midlands > Birmingham (0.04)

Genre: Research Report (0.64)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.90)

Spooner, Thomas, Savani, Rahul

A Natural Actor-Critic Algorithm with Downside Risk Constraints

Existing work on risk-sensitive reinforcement learning - both for symmetric and downside risk measures - has typically used direct Monte-Carlo estimation of policy gradients. While this approach yields unbiased gradient estimates, it also suffers from high variance and decreased sample efficiency compared to temporal-difference methods. In this paper, we study prediction and control with aversion to downside risk which we gauge by the lower partial moment of the return. We introduce a new Bellman equation that upper bounds the lower partial moment, circumventing its non-linearity. We prove that this proxy for the lower partial moment is a contraction, and provide intuition into the stability of the algorithm by variance decomposition. This allows sample-efficient, on-line estimation of partial moments. For risk-sensitive control, we instantiate Reward Constrained Policy Optimization, a recent actor-critic method for finding constrained policies, with our proxy for the lower partial moment. We extend the method to use natural policy gradients and demonstrate the effectiveness of our approach on three benchmark problems for risk-sensitive reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2007.04203

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceJul-7-2020

Towards a practical measure of interference for reinforcement learning

Liu, Vincent, White, Adam, Yao, Hengshuai, White, Martha

Catastrophic interference is common in many network-based learning systems, and many proposals exist for mitigating it. But, before we overcome interference we must understand it better. In this work, we provide a definition of interference for control in reinforcement learning. We systematically evaluate our new measures, by assessing correlation with several measures of learning performance, including stability, sample efficiency, and online and offline control performance across a variety of learning architectures. Our new interference measure allows us to ask novel scientific questions about commonly used deep learning architectures. In particular we show that target network frequency is a dominating factor for interference, and that updates on the last layer result in significantly higher interference than updates internal to the network. This new measure can be expensive to compute; we conclude with motivation for an efficient proxy measure and empirically demonstrate it is correlated with our definition of interference.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2007.03807

Country:

North America > Canada > Alberta (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)