AITopics

1905.07628

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

arXiv.org Machine LearningMay-18-2019

Combining Experience Replay with Exploration by Random Network Distillation

Sovrano, Francesco

Abstract--Our work is a simple extension of the paper "Exploration by Random Network Distillation"[1]. Among them we cite the "exploration Our work is a simple extension of PPO/RND. We show how to I. INTRODUCTION We are able to do it by the effects of its actions (in the environment) while trying using a new technique named Prioritized Oversampled Experience to maximize a cumulative return/reward. In other words, a RL Replay (POER), that has been built upon the definition of agent learns how to optimally interact with the environment, by what is the important experience useful to replay. In POER we receiving some environmental feedbacks called rewards. The mix oversampling [3] with experience prioritization [4], trying more an action is good, the higher should be the reward. But to achieve the goal of an optimal balance between exploration in many scenarios, rewards are very rare and difficult to get, and exploitation. In order to do this, we: thus making Reinforcement Learning very ...

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1905.07579

Country: Europe > Italy (0.14)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceMay-17-2019

A Regularized Opponent Model with Maximum Entropy Objective

Tian, Zheng, Wen, Ying, Gong, Zhichen, Punakkath, Faiz, Zou, Shihao, Wang, Jun

In a single-agent setting, reinforcement learning (RL) tasks can be cast into an inference problem by introducing a binary random variable o, which stands for the "optimality". In this paper, we redefine the binary random variable o in multi-agent setting and formalize multi-agent reinforcement learning (MARL) as probabilistic inference. We derive a variational lower bound of the likelihood of achieving the optimality and name it as Regularized Opponent Model with Maximum Entropy Objective (ROMMEO). From ROMMEO, we present a novel perspective on opponent modeling and show how it can improve the performance of training agents theoretically and empirically in cooperative games. To optimize ROMMEO, we first introduce a tabular Q-iteration method ROMMEO-Q with proof of convergence. We extend the exact algorithm to complex environments by proposing an approximate version, ROMMEO-AC. We evaluate these two algorithms on the challenging iterated matrix game and differential game respectively and show that they can outperform strong MARL baselines.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1905.08087

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Alberta (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Martin, John D., Lyskawinski, Michal, Li, Xiaohu, Englot, Brendan

Stochastically Dominant Distributional Reinforcement Learning

We describe a new approach for mitigating risk in the Reinforcement Learning paradigm. Instead of reasoning about expected utility, we use second-order stochastic dominance (SSD) to directly compare the inherent risk of random returns induced by different actions. We frame the RL optimization within the space of probability measures to accommodate the SSD relation, treating Bellman's equation as a potential energy functional. This brings us to Wasserstein gradient flows, for which the optimality and convergence are well understood. We propose a discrete-measure approximation algorithm called the Dominant Particle Agent (DPA), and we demonstrate how safety and performance are better balanced with DPA than with existing baselines.

machine learning, reinforcement learning, return distribution, (14 more...)

1905.07318

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Tomar, Manan, Sathuluri, Akhil, Ravindran, Balaraman

MaMiC: Macro and Micro Curriculum for Robotic Reinforcement Learning

Shaping in humans and animals has been shown to be a powerful tool for learning complex tasks as compared to learning in a randomized fashion. This makes the problem less complex and enables one to solve the easier sub task at hand first. Generating a curriculum for such guided learning involves subjecting the agent to easier goals first, and then gradually increasing their difficulty. This paper takes a similar direction and proposes a dual curriculum scheme for solving robotic manipulation tasks with sparse rewards, called MaMiC. It includes a macro curriculum scheme which divides the task into multiple sub-tasks followed by a micro curriculum scheme which enables the agent to learn between such discovered sub-tasks. We show how combining macro and micro curriculum strategies help in overcoming major exploratory constraints considered in robot manipulation tasks without having to engineer any complex rewards. We also illustrate the meaning of the individual curricula and how they can be used independently based on the task. The performance of such a dual curriculum scheme is analyzed on the Fetch environments.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

1905.07193

Country:

North America > Canada > Quebec > Montreal (0.05)
Asia > India > Tamil Nadu > Chennai (0.04)

Genre:

Research Report (0.40)
Instructional Material > Course Syllabus & Notes (0.34)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

de Witt, Christian Schroeder, Hornigold, Thomas

Stratospheric Aerosol Injection as a Deep Reinforcement Learning Problem

As global greenhouse gas emissions continue to rise, the use of stratospheric aerosol injection (SAI), a form of solar geoengineering, is increasingly considered in order to artificially mitigate climate change effects. However, initial research in simulation suggests that naive SAI can have catastrophic regional consequences, which may induce serious geostrategic conflicts. Current geo-engineering research treats SAI control in low-dimensional approximation only. We suggest treating SAI as a high-dimensional control problem, with policies trained according to a context-sensitive reward function within the Deep Reinforcement Learning (DRL) paradigm. In order to facilitate training in simulation, we suggest to emulate HadCM3, a widely used General Circulation Model, using deep learning techniques. We believe this is the first application of DRL to the climate sciences.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

1905.07366

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.50)

Industry:

Education > Focused Education > Special Education (0.42)
Energy (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.56)

Enforcing constraints for time series prediction in supervised, unsupervised and reinforcement learning

Stinis, Panos

We assume that we are given a time series of data from a dynamical system and our task is to learn the flow map of the dynamical system. We present a collection of results on how to enforce constraints coming from the dynamical system in order to accelerate the training of deep neural networks to represent the flow map of the system as well as increase their predictive ability. In particular, we provide ways to enforce constraints during training for all three major modes of learning, namely supervised, unsupervised and reinforcement learning. In general, the dynamic constraints need to include terms which are analogous to memory terms in model reduction formalisms. Such memory terms act as a restoring force which corrects the errors committed by the learned flow map during prediction. For supervised learning, the constraints are added to the objective function. For the case of unsupervised learning, in particular generative adversarial networks, the constraints are introduced by augmenting the input of the discriminator. Finally, for the case of reinforcement learning and in particular actor-critic methods, the constraints are added to the reward function. In addition, for the reinforcement learning case, we present a novel approach based on homotopy of the action-value function in order to stabilize and accelerate training. We use numerical results for the Lorenz system to illustrate the various constructions.

constraint, machine learning, reinforcement learning, (17 more...)

1905.07501

Country: North America > United States (0.67)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

TBQ($\sigma$): Improving Efficiency of Trace Utilization for Off-Policy Reinforcement Learning

Shi, Longxiang, Li, Shijian, Cao, Longbing, Yang, Long, Pan, Gang

Off-policy reinforcement learning with eligibility traces is challenging because of the discrepancy between target policy and behavior policy. One common approach is to measure the difference between two policies in a probabilistic way, such as importance sampling and tree-backup. However, existing off-policy learning methods based on probabilistic policy measurement are inefficient when utilizing traces under a greedy target policy, which is ineffective for control problems. The traces are cut immediately when a non-greedy action is taken, which may lose the advantage of eligibility traces and slow down the learning process. Alternatively, some non-probabilistic measurement methods such as General Q($\lambda$) and Naive Q($\lambda$) never cut traces, but face convergence problems in practice. To address the above issues, this paper introduces a new method named TBQ($\sigma$), which effectively unifies the tree-backup algorithm and Naive Q($\lambda$). By introducing a new parameter $\sigma$ to illustrate the \emph{degree} of utilizing traces, TBQ($\sigma$) creates an effective integration of TB($\lambda$) and Naive Q($\lambda$) and continuous role shift between them. The contraction property of TB($\sigma$) is theoretically analyzed for both policy evaluation and control settings. We also derive the online version of TBQ($\sigma$) and give the convergence proof. We empirically show that, for $\epsilon\in(0,1]$ in $\epsilon$-greedy policies, there exists some degree of utilizing traces for $\lambda\in[0,1]$, which can improve the efficiency in trace utilization for off-policy reinforcement learning, to both accelerate the learning process and improve the performance.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1905.07237

Country:

Asia > China (0.29)
North America > United States > Massachusetts (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Bautista, Darwin, Dionido, Raimarc

Mastering the Game of Sungka from Random Play

arXiv.org Machine LearningMay-16-2019

Recent work in reinforcement learning demonstrated that learning solely through self-play is not only possible, but could also result in novel strategies that humans never would have thought of. However, optimization methods cast as a game between two players require careful tuning to prevent suboptimal results. Hence, we look at random play as an alternative method. In this paper, we train a DQN agent to play Sungka, a two-player turn-based board game wherein the players compete to obtain more stones than the other. We show that even with purely random play, our training algorithm converges very fast and is stable. Moreover, we test our trained agent against several baselines and show its ability to consistently win against these.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1905.07102

Country:

Asia > Philippines > Luzon > National Capital Region > City of Quezon (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Genre: Research Report (0.83)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)

arXiv.org Artificial IntelligenceMay-16-2019

Knowledge-Based Sequential Decision-Making Under Uncertainty

Lyu, Daoming

Deep reinforcement learning (DRL) algorithms have achieved great success on sequential decision-making problems, yet is criticized for the lack of data-efficiency and explainability. Especially, explainability of subtasks is critical in hierarchical decision-making since it enhances the transparency of black-box-style DRL methods and helps the RL practitioners to understand the high-level behavior of the system better. To improve the data-efficiency and explainability of DRL, declarative knowledge is introduced in this work and novel algorithm is proposed by integrating DRL with symbolic planning. An experimental analysis on publicly available benchmarks validates the explainability of the subtasks and shows that our method can outperform the state-of-the-art approach in terms of data-efficiency.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

1905.0703

Country: North America > United States (0.14)

Genre: Research Report (0.84)

Industry:

Leisure & Entertainment > Games (0.47)
Transportation (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)