AITopics

2007.10129

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > Finland > Northern Ostrobothnia > Oulu (0.04)
(18 more...)

Genre: Research Report (1.00)

Industry:

Telecommunications (1.00)
Information Technology > Networks (0.34)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Ramponi, Giorgia, Drappo, Gianluca, Restelli, Marcello

Inverse Reinforcement Learning from a Gradient-based Learner

arXiv.org Machine LearningJul-15-2020

Inverse Reinforcement Learning addresses the problem of inferring an expert's reward function from demonstrations. However, in many applications, we not only have access to the expert's near-optimal behavior, but we also observe part of her learning process. In this paper, we propose a new algorithm for this setting, in which the goal is to recover the reward function being optimized by an agent, given a sequence of policies produced during learning. Our approach is based on the assumption that the observed agent is updating her policy parameters along the gradient direction. Then we extend our method to deal with the more realistic scenario where we only have access to a dataset of learning trajectories. For both settings, we provide theoretical insights into our algorithms' performance. Finally, we evaluate the approach in a simulated GridWorld environment and on the MuJoCo environments, comparing it with the state-of-the-art baseline.

learner, machine learning, reinforcement learning, (17 more...)

2007.07812

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(6 more...)

Genre: Research Report (0.40)

Industry: Transportation (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Hoppe, Sabrina, Toussaint, Marc

Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning

arXiv.org Machine LearningJul-15-2020

In state of the art model-free off-policy deep reinforcement learning, a replay memory is used to store past experience and derive all network updates. Even if both state and action spaces are continuous, the replay memory only holds a finite number of transitions. We represent these transitions in a data graph and link its structure to soft divergence. By selecting a subgraph with a favorable structure, we construct a simplified Markov Decision Process for which exact Q-values can be computed efficiently as more data comes in. The subgraph and its associated Q-values can be represented as a QGraph. We show that the Q-value for each transition in the simplified MDP is a lower bound of the Q-value for the same transition in the original continuous Q-learning problem. By using these lower bounds in temporal difference learning, our method QG-DDPG is less prone to soft divergence and exhibits increased sample efficiency while being more robust to hyperparameters. QGraphs also retain information from transitions that have already been overwritten in the replay memory, which can decrease the algorithm's sensitivity to the replay memory capacity.

machine learning, reinforcement learning, transition, (13 more...)

2007.07582

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Near-Optimal Reinforcement Learning with Self-Play

Bai, Yu, Jin, Chi, Yu, Tiancheng

This paper considers the problem of designing optimal algorithms for reinforcement learning in two-player zero-sum games. We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct supervision. In a tabular episodic Markov game with $S$ states, $A$ max-player actions and $B$ min-player actions, the best existing algorithm for finding an approximate Nash equilibrium requires $\tilde{\mathcal{O}}(S^2AB)$ steps of game playing, when only highlighting the dependency on $(S,A,B)$. In contrast, the best existing lower bound scales as $\Omega(S(A+B))$ and has a significant gap from the upper bound. This paper closes this gap for the first time: we propose an optimistic variant of the \emph{Nash Q-learning} algorithm with sample complexity $\tilde{\mathcal{O}}(SAB)$, and a new \emph{Nash V-learning} algorithm with sample complexity $\tilde{\mathcal{O}}(S(A+B))$. The latter result matches the information-theoretic lower bound in all problem-dependent parameters except for a polynomial factor of the length of each episode. In addition, we present a computational hardness result for learning the best responses against a fixed opponent in Markov games---a learning objective different from finding the Nash equilibrium.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2006.12007

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Zhao, Ruihan, Abbeel, Pieter, Tiomkin, Stas

Efficient Online Estimation of Empowerment for Reinforcement Learning

Training artificial agents to acquire desired skills through model-free reinforcement learning (RL) depends heavily on domain-specific knowledge, and the ability to reset the system to desirable configurations for better reward signals. The former hinders generalization to new domains; the latter precludes training in real-life conditions because physical resets are not scalable. Recently, intrinsic motivation was proposed as an alternative objective to alleviate the first issue, but there has been no reasonable remedy for the second. In this work, we present an efficient online algorithm for a type of intrinsic motivation, known as empowerment, and address both limitations. Our method is distinguished by its significantly lower sample and computation complexity, along with improved training stability compared to the relevant state of the art. We achieve this superior efficiency by transforming the challenging empowerment computation into a convex optimization problem through neural networks. In simulations, our method manages to train policies with neither domain-specific knowledge nor manual intervention. To address the issue of resetting in RL, we further show that our approach boosts learning when there's no early termination. Our proposed method opens doors for studying intrinsic motivation for policy training and scaling up model-free RL training in real-life conditions.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2007.07356

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Ecoffet, Adrien, Lehman, Joel

Reinforcement Learning Under Moral Uncertainty

An ambitious goal for artificial intelligence is to create agents that behave ethically: The capacity to abide by human moral norms would greatly expand the context in which autonomous agents could be practically and safely deployed. While ethical agents could be trained through reinforcement, by rewarding correct behavior under a specific moral theory (e.g. utilitarianism), there remains widespread disagreement (both societally and among moral philosophers) about the nature of morality and what ethical theory (if any) is objectively correct. Acknowledging such disagreement, recent work in moral philosophy proposes that ethical behavior requires acting under moral uncertainty, i.e. to take into account when acting that one's credence is split across several plausible ethical theories. Inspired by such work, this paper proposes a formalism that translates such insights to the field of reinforcement learning. Demonstrating the formalism's potential, we then train agents in simple environments to act under moral uncertainty, highlighting how such uncertainty can help curb extreme behavior from commitment to single theories. The overall aim is to draw productive connections from the fields of moral philosophy and machine ethics to that of machine learning, to inspire further research by highlighting a spectrum of machine learning research questions relevant to training ethically capable reinforcement learning agents.

deontology credence, machine learning, reinforcement learning, (17 more...)

2006.04734

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.65)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning

Khadka, Shauharda, Aflalo, Estelle, Marder, Mattias, Ben-David, Avrech, Miret, Santiago, Tang, Hanlin, Mannor, Shie, Hazan, Tamir, Majumdar, Somdeb

As modern neural networks have grown to billions of parameters, meeting tight latency budgets has become increasingly challenging. Approaches like compression, sparsification and network pruning have proven effective to tackle this problem - but they rely on modifications of the underlying network. In this paper, we look at a complimentary approach of optimizing how tensors are mapped to on-chip memory in an inference accelerator while leaving the network parameters untouched. Since different memory components trade off capacity for bandwidth differently, a sub-optimal mapping can result in high latency. We introduce evolutionary graph reinforcement learning (EGRL) - a method combining graph neural networks, reinforcement learning (RL) and evolutionary search - that aims to find the optimal mapping to minimize latency. Furthermore, a set of fast, stateless policies guide the evolutionary search to improve sample-efficiency. We train and validate our approach directly on the Intel NNP-I chip for inference using a batch size of 1. EGRL outperforms policy-gradient, evolutionary search and dynamic programming baselines on BERT, ResNet-101 and ResNet-50. We achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.

evolutionary algorithm, machine learning, workload, (18 more...)

2007.07298

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Middle East > Israel (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (0.64)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

arXiv.org Machine LearningJul-14-2020

Learning to Sample with Local and Global Contexts in Experience Replay Buffer

Oh, Youngmin, Lee, Kimin, Shin, Jinwoo, Yang, Eunho, Hwang, Sung Ju

Experience replay, which enables the agents to remember and reuse experience from the past, plays a significant role in the success of off-policy reinforcement learning (RL). To utilize the experience replay efficiently, experience transitions should be sampled with consideration of their significance, such that the known prioritized experience replay (PER) further allows to sample more important experience. Yet, the conventional PER may result in generating highly biased samples due to considering a single metric such as TD-error and computing the sampling rate independently for each experience. To tackle this issue, we propose a Neural Experience Replay Sampler (NERS), which adaptively evaluates the relative importance of a sampled transition by obtaining context from not only its (local) values that characterize itself such as TD-error or the raw features but also other (global) transitions. We validate our framework on multiple benchmark tasks for both continuous and discrete controls and show that the proposed framework significantly improves the performance of various off-policy RL methods. Further analysis confirms that the improvements indeed come from the use of diverse features and the consideration of the relative importance of experiences.

machine learning, reinforcement learning, transition, (17 more...)

2007.07358

Country:

Asia > South Korea (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Tan, Kai Liang, Esfandiari, Yasaman, Lee, Xian Yeow, Aakanksha, null, Sarkar, Soumik

Robustifying Reinforcement Learning Agents via Action Space Adversarial Training

arXiv.org Machine LearningJul-14-2020

Adoption of machine learning (ML)-enabled cyber-physical systems (CPS) are becoming prevalent in various sectors of modern society such as transportation, industrial, and power grids. Recent studies in deep reinforcement learning (DRL) have demonstrated its benefits in a large variety of data-driven decisions and control applications. As reliance on ML-enabled systems grows, it is imperative to study the performance of these systems under malicious state and actuator attacks. Traditional control systems employ resilient/fault-tolerant controllers that counter these attacks by correcting the system via error observations. However, in some applications, a resilient controller may not be sufficient to avoid a catastrophic failure. Ideally, a robust approach is more useful in these scenarios where a system is inherently robust (by design) to adversarial attacks. While robust control has a long history of development, robust ML is an emerging research area that has already demonstrated its relevance and urgency. However, the majority of robust ML research has focused on perception tasks and not on decision and control tasks, although the ML (specifically RL) models used for control applications are equally vulnerable to adversarial attacks. In this paper, we show that a well-performing DRL agent that is initially susceptible to action space perturbations (e.g. actuator attacks) can be robustified against similar perturbations through adversarial training.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2007.07176

Country:

Asia > Middle East > Jordan (0.04)
Asia > India > Uttar Pradesh (0.04)
North America > United States > Iowa > Story County > Ames (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Chen, Wuyang, Yu, Zhiding, Wang, Zhangyang, Anandkumar, Anima

Automated Synthetic-to-Real Generalization

arXiv.org Machine LearningJul-14-2020

Models trained on synthetic images often face degraded generalization to real data. As a convention, these models are often initialized with ImageNet pre-trained representation. Yet the role of ImageNet knowledge is seldom discussed despite common practices that leverage this knowledge to maintain the generalization ability. An example is the careful hand-tuning of early stopping and layer-wise learning rates, which is shown to improve synthetic-to-real generalization but is also laborious and heuristic. In this work, we explicitly encourage the synthetically trained model to maintain similar representations with the ImageNet pre-trained model, and propose a \textit{learning-to-optimize (L2O)} strategy to automate the selection of layer-wise learning rates. We demonstrate that the proposed framework can significantly improve the synthetic-to-real generalization performance without seeing and training on real data, while also benefiting downstream tasks such as domain adaptation. Code is available at: https://github.com/NVlabs/ASG.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2007.06965

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Texas > Brazos County > College Station (0.04)
North America > United States > California (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)