AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning

Li, Wenhao, Jin, Bo, Wang, Xiangfeng, Yan, Junchi, Zha, Hongyuan

arXiv.org Artificial IntelligenceApr-17-2020

Traditional centralized multi-agent reinforcement learning (MARL) algorithms are sometimes unpractical in complicated applications, due to non-interactivity between agents, curse of dimensionality and computation complexity. Hence, several decentralized MARL algorithms are motivated. However, existing decentralized methods only handle the fully cooperative setting where massive information needs to be transmitted in training. The block coordinate gradient descent scheme they used for successive independent actor and critic steps can simplify the calculation, but it causes serious bias. In this paper, we propose a flexible fully decentralized actor-critic MARL framework, which can combine most of actor-critic methods, and handle large-scale general cooperative multi-agent setting. A primal-dual hybrid gradient descent type algorithm framework is designed to learn individual agents separately for decentralization. From the perspective of each agent, policy improvement and value evaluation are jointly optimized, which can stabilize multi-agent policy learning. Furthermore, our framework can achieve scalability and stability for large-scale environment and reduce information transmission, by the parameter sharing mechanism and a novel modeling-other-agents methods based on theory-of-mind and online supervised learning. Sufficient experiments in cooperative Multi-agent Particle Environment and StarCraft II show that our decentralized MARL instantiation algorithms perform competitively against conventional centralized and decentralized methods.

agent, algorithm, gradient, (16 more...)

arXiv.org Artificial Intelligence

2004.11145

Country:

Asia > China > Shanghai > Shanghai (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Florida > Orange County > Orlando (0.04)

Genre: Research Report > New Finding (0.45)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Add feedback

Knowledge-guided Deep Reinforcement Learning for Interactive Recommendation

Chen, Xiaocong, Huang, Chaoran, Yao, Lina, Wang, Xianzhi, Liu, Wei, Zhang, Wenjie

arXiv.org Machine LearningApr-17-2020

Interactive recommendation aims to learn from dynamic interactions between items and users to achieve responsiveness and accuracy. Reinforcement learning is inherently advantageous for coping with dynamic environments and thus has attracted increasing attention in interactive recommendation research. Inspired by knowledge-aware recommendation, we proposed Knowledge-Guided deep Reinforcement learning (KGRL) to harness the advantages of both reinforcement learning and knowledge graphs for interactive recommendation. This model is implemented upon the actor-critic network framework. It maintains a local knowledge network to guide decision-making and employs the attention mechanism to capture long-term semantics between items. We have conducted comprehensive experiments in a simulated online environment with six public real-world datasets and demonstrated the superiority of our model over several state-of-the-art methods.

dataset, knowledge graph, recommendation, (14 more...)

arXiv.org Machine Learning

2004.08068

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)

Genre: Research Report > Promising Solution (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning to learn Berkeley Engineering

#artificialintelligenceApr-16-2020, 21:17:54 GMT

Taking inspiration from the way that children instinctively learn and adapt to a wide range of unpredictable environments, Abbeel and assistant professor Sergey Levine are developing algorithms that enable robots to learn from past experiences -- and even from other robots. Based on a principle called deep reinforcement learning, their work is bringing robots past a crucial threshold in demonstrating human-like intelligence, with the ability to independently solve problems and master new tasks in a quicker, more efficient manner.

learn berkeley engineering, learning, robot

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.78)

Add feedback

An Application of Deep Reinforcement Learning to Algorithmic Trading - Damien Ernst

#artificialintelligenceApr-16-2020, 07:09:34 GMT

This research paper presents a novel deep reinforcement learning (DRL) solution to the decision-making problem behind algorithmic trading in the stock markets: selecting the appropriate trading action (buy, hold or sell shares) without human intervention. Naturally, the core objective is to achieve an appreciable profit while efficiently mitigating the trading risk. This specific task is particularly complex due to the sequential nature of the problem as well as the stochastic and adversarial aspects of the environment. Moreover, a huge amount of both quantitative and qualitative information, which is generally not available, influences the dynamics of this environment. Until now, DRL algorithms mainly focused on well-known environment with specific properties, such as games.

algorithmic trading, damien ernst, deep reinforcement learning, (6 more...)

#artificialintelligence

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Preferred Networks at NeurIPS 2019 Preferred Networks Research & Development

#artificialintelligenceApr-16-2020, 03:18:23 GMT

Preferred Networks, as a research-oriented AI startup, participates every year in NeurIPS, the world's biggest machine learning conference. This post highlights our accomplishments and activities at NeurIPS 2019. We are very excited to be a part of it & looking forward to seeing top ML researchers from all over the world there! This year, four papers from Preferred Networks have been accepted for poster presentation. Three of them are based on ex-intern's work and we are very proud of their dedication and high-quality research.

intern, neurips 2019, preferred network, (9 more...)

#artificialintelligence

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.06)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.77)

Add feedback

Data-Driven Robust Control Using Reinforcement Learning

Ngo, Phuong D., Godtliebsen, Fred

arXiv.org Artificial IntelligenceApr-16-2020

This paper proposes a robust control design method using reinforcement-learning for controlling partially-unknown dynamical systems under uncertain conditions. The method extends the optimal reinforcement-learning algorithm with a new learning technique that is based on the robust control theory. By learning from the data, the algorithm proposed actions that guarantees the stability of the closed loop system within the uncertainties estimated from the data. Control policies are calculated by solving a set of linear matrix inequalities. The controller was evaluated using simulations on a blood glucose model for patients with type-1 diabetes. Simulation results show that the proposed methodology is capable of safely regulates the blood glucose within a healthy level under the influence of measurement and process noises. The controller has also significantly reduced the post-meal fluctuation of the blood glucose. A comparison between the proposed algorithm and the existing optimal reinforcement learning algorithm shows the improved robustness of the closed loop system using our method.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2004.0769

Country:

Europe > Norway > Northern Norway > Troms > Tromsø (0.04)
North America > United States > New Jersey (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

MARLeME: A Multi-Agent Reinforcement Learning Model Extraction Library

Kazhdan, Dmitry, Shams, Zohreh, Liò, Pietro

arXiv.org Artificial IntelligenceApr-16-2020

Multi-Agent Reinforcement Learning (MARL) encompasses a powerful class of methodologies that have been applied in a wide range of fields. An effective way to further empower these methodologies is to develop libraries and tools that could expand their interpretability and explainability. In this work, we introduce MARLeME: a MARL model extraction library, designed to improve explainability of MARL systems by approximating them with symbolic models. Symbolic models offer a high degree of interpretability, well-defined properties, and verifiable behaviour. Consequently, they can be used to inspect and better understand the underlying MARL system and corresponding MARL agents, as well as to replace all/some of the agents that are particularly safety and security critical.

agent, argument, marleme, (14 more...)

arXiv.org Artificial Intelligence

2004.07928

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.65)

Industry: Leisure & Entertainment > Sports > Soccer (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Continual Reinforcement Learning with Multi-Timescale Replay

Kaplanis, Christos, Clopath, Claudia, Shanahan, Murray

arXiv.org Artificial IntelligenceApr-16-2020

In this paper, we propose a multi-timescale replay (MTR) buffer for improving continual learning in RL agents faced with environments that are changing continuously over time at timescales that are unknown to the agent. The basic MTR buffer comprises a cascade of sub-buffers that accumulate experiences at different timescales, enabling the agent to improve the tradeoff between adaptation to new data and retention of old knowledge. We also combine the MTR framework with invariant risk minimization [Arjovsky et al., 2019] with the idea of encouraging the agent to learn a policy that is robust across the various environments it encounters over time. The MTR methods are evaluated in three different continual learning settings on two continuous control tasks and, in many cases, show improvement over the baselines.

agent, buffer, timescale, (15 more...)

arXiv.org Artificial Intelligence

2004.0753

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.50)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Game Theoretic Framework for Model Based Reinforcement Learning

Rajeswaran, Aravind, Mordatch, Igor, Kumar, Vikash

arXiv.org Machine LearningApr-16-2020

Model-based reinforcement learning (MBRL) has recently gained immense interest due to its potential for sample efficiency and ability to incorporate off-policy data. However, designing stable and efficient MBRL algorithms using rich function approximators have remained challenging. To help expose the practical challenges in MBRL and simplify algorithm design from the lens of abstraction, we develop a new framework that casts MBRL as a game between: (1) a policy player, which attempts to maximize rewards under the learned model; (2) a model player, which attempts to fit the real-world data collected by the policy player. For algorithm development, we construct a Stackelberg game between the two players, and show that it can be solved with approximate bi-level optimization. This gives rise to two natural families of algorithms for MBRL based on which player is chosen as the leader in the Stackelberg game. Together, they encapsulate, unify, and generalize many previous MBRL algorithms. Furthermore, our framework is consistent with and provides a clear basis for heuristics known to be important in practice from prior works. Finally, through experiments we validate that our proposed algorithms are highly sample efficient, match the asymptotic performance of model-free policy gradient, and scale gracefully to high-dimensional tasks like dexterous hand manipulation.

algorithm, learning, optimization, (15 more...)

arXiv.org Machine Learning

2004.07804

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Analyzing Reinforcement Learning Benchmarks with Random Weight Guessing

Oller, Declan, Glasmachers, Tobias, Cuccu, Giuseppe

arXiv.org Machine LearningApr-16-2020

We propose a novel method for analyzing and visualizing the complexity of standard reinforcement learning (RL) benchmarks based on score distributions. A large number of policy networks are generated by randomly guessing their parameters, and then evaluated on the benchmark task; the study of their aggregated results provide insights into the benchmark complexity. Our method guarantees objectivity of evaluation by sidestepping learning altogether: the policy network parameters are generated using Random Weight Guessing (RWG), making our method agnostic to (i) the classic RL setup, (ii) any learning algorithm, and (iii) hyperparameter tuning. We show that this approach isolates the environment complexity, highlights specific types of challenges, and provides a proper foundation for the statistical analysis of the task's difficulty. We test our approach on a variety of classic control benchmarks from the OpenAI Gym, where we show that small untrained networks can provide a robust baseline for a variety of tasks. The networks generated often show good performance even without gradual learning, incidentally highlighting the triviality of a few popular benchmarks.

algorithm, architecture, complexity, (16 more...)

arXiv.org Machine Learning

2004.07707

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
Europe > Switzerland > Fribourg > Fribourg (0.04)
Europe > Germany (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback