AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

(PDF) Deep Reinforcement Learning applied to Fluid Mechanics: materials from the 2019 Flow/Interface School on Machine Learning and Data Driven Methods

#artificialintelligenceDec-10-2019, 07:15:26 GMT

We use cookies to make interactions with our website easy and meaningful, to better understand the use of our services, and to tailor advertising. For further information, including about cookie settings, please read our Cookie Policy . By continuing to use this site, you consent to the use of cookies.

deep reinforcement learning, flow interface school, learning and data driven method, (3 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

Adventures in Machine Learning - Learn and explore machine learning

#artificialintelligenceDec-10-2019, 03:46:04 GMT

In previous posts (here and here) I introduced Double Q learning and the Dueling Q architecture. These followed on from posts about deep Q learning, and showed how double Q and dueling Q learning is superior to vanilla deep Q learning. However, these posts only included examples of simplistic environments like the OpenAI Cartpole environment. These types of environments are good to learn on, but more complicated environments are both more interesting and fun. They also demonstrate better the complexities of implementing deep reinforcement learning in realistic cases. In this post, I'll use similar code to that shown in my Dueling Q TensorFlow 2 but in this case apply it to the Open AI Atari Space Invaders environment.

network weight, primary network, target network, (15 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

oxwhirl/pymarl

#artificialintelligenceDec-10-2019, 00:14:29 GMT

PyMARL is WhiRL's framework for deep multi-agent reinforcement learning and includes implementations of the following algorithms: PyMARL is written in PyTorch and uses SMAC as its environment. This will download SC2 into the 3rdparty folder and copy the maps necessary to run over. The config files act as defaults for an algorithm or environment. They are all located in src/config. All results will be stored in the Results folder.

config file, oxwhirl pymarl, starcraft ii, (7 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.58)

Add feedback

Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

Islam, Riashat, Seraj, Raihan, Arnob, Samin Yeasar, Precup, Doina

arXiv.org Machine LearningDec-10-2019

We study the problem of off-policy critic evaluation in several variants of value-based off-policy actor-critic algorithms. Off-policy actor-critic algorithms require an off-policy critic evaluation step, to estimate the value of the new policy after every policy gradient update. Despite enormous success of off-policy policy gradients on control tasks, existing general methods suffer from high variance and instability, partly because the policy improvement depends on gradient of the estimated value function. In this work, we present a new way of off-policy policy evaluation in actor-critic, based on the doubly robust estimators. We extend the doubly robust estimator from off-policy policy evaluation (OPE) to actor-critic algorithms that consist of a reward estimator performance model. We find that doubly robust estimation of the critic can significantly improve performance in continuous control tasks. Furthermore, in cases where the reward function is stochastic that can lead to high variance, doubly robust critic estimation can improve performance under corrupted, stochastic reward signals, indicating its usefulness for robust and safe reinforcement learning.

algorithm, estimator, evaluation, (12 more...)

arXiv.org Machine Learning

1912.05109

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > Canada > Quebec > Montreal (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(7 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Imitation Learning via Off-Policy Distribution Matching

Kostrikov, Ilya, Nachum, Ofir, Tompson, Jonathan

arXiv.org Machine LearningDec-10-2019

A BSTRACT When performing imitation learning from expert demonstrations, distribution matching is a popular approach, in which one alternates between estimating distribution ratios and then using these ratios as rewards in a standard reinforcement learning (RL) algorithm. Traditionally, estimation of the distribution ratio requires on-policy data, which has caused previous work to either be exorbitantly data-inefficient or alter the original objective in a manner that can drastically change its optimum. In this work, we show how the original distribution ratio estimation objective may be transformed in a principled manner to yield a completely off-policy objective. In addition to the data-efficiency that this provides, we are able to show that this objective also renders the use of a separate RL optimization unnecessary. Rather, an imitation policy may be learned directly from this objective without the use of explicit rewards. We call the resulting algorithm V alueDICEand evaluate it on a suite of popular imitation learning benchmarks, finding that it can achieve state-of-the-art sample efficiency and performance. Accordingly, many successful demonstrations of RL often rely on carefully handcrafted rewards with various bonuses and penalties designed to encourage intended behavior (Nachum et al., 2019a; Andrychowicz et al., 2018).

algorithm, imitation, objective, (13 more...)

arXiv.org Machine Learning

1912.05032

Country: North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

Xu, Pan, Gu, Quanquan

arXiv.org Machine LearningDec-10-2019

Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning remains virtually unknown. In this paper, we present a finite-time analysis of a neural Q-learning algorithm, where the data are generated from a Markov decision process and the action-value function is approximated by a deep ReLU neural network. We prove that neural Q-learning finds the optimal policy with $O(1/\sqrt{T})$ convergence rate if the neural function approximator is sufficiently overparameterized, where $T$ is the number of iterations. To our best knowledge, our result is the first finite-time analysis of neural Q-learning under non-i.i.d. data assumption.

algorithm, function approximation, q-learning, (15 more...)

arXiv.org Machine Learning

1912.04511

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

Measuring the Reliability of Reinforcement Learning Algorithms

Chan, Stephanie C. Y., Fishman, Sam, Canny, John, Korattikara, Anoop, Guadarrama, Sergio

arXiv.org Artificial IntelligenceDec-10-2019

Lack of reliability is a well-known issue for reinforcement learning (RL) algorithms. This problem has gained increasing attention in recent years, and efforts to improve it have grown substantially. To aid RL researchers and production users with the evaluation and improvement of reliability, we propose a set of metrics that quantitatively measure different aspects of reliability. In this work, we focus on variability and risk, both during training and after learning (on a fixed policy). We designed these metrics to be general-purpose, and we also designed complementary statistical tests to enable rigorous comparisons on these metrics. In this paper, we first describe the desired properties of the metrics and their design, the aspects of reliability that they measure, and their applicability to different scenarios. We then describe the statistical tests and make additional practical recommendations for reporting results. The metrics and accompanying statistical tools have been made available as an open-source library, here: https://github.com/google-research/rl-reliability-metrics . We apply our metrics to a set of common RL algorithms and environments, compare them, and analyze the results.

algorithm, per-environment basis, reliability, (16 more...)

arXiv.org Artificial Intelligence

1912.05663

Country:

North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.04)
Europe > Portugal (0.04)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Efficient and Robust Reinforcement Learning with Uncertainty-based Value Expansion

Zhou, Bo, Zeng, Hongsheng, Wang, Fan, Li, Yunxiang, Tian, Hao

arXiv.org Artificial IntelligenceDec-10-2019

By integrating dynamics models into model-free reinforcement learning (RL) methods, model-based value expansion (MVE) algorithms have shown a significant advantage in sample efficiency as well as value estimation. However, these methods suffer from higher function approximation errors than model-free methods in stochastic environments due to a lack of modeling the environmental randomness. As a result, their performance lags behind the best model-free algorithms in some challenging scenarios. In this paper, we propose a novel Hybrid-RL method that builds on MVE, namely the Risk Averse Value Expansion (RAVE). With imaginative rollouts generated by an ensemble of probabilistic dynamics models, we further introduce the aversion of risks by seeking the lower confidence bound of the estimation. Experiments on a range of challenging environments show that by modeling the uncertainty completely, RAVE substantially enhances the robustness of previous model-based methods, and yields state-of-the-art performance. With this technique, our solution gets the first place in NeurIPS 2019: Learn to Move.

agent, ensemble, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1912.05328

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods

Islam, Riashat, Seraj, Raihan, Bacon, Pierre-Luc, Precup, Doina

arXiv.org Artificial IntelligenceDec-10-2019

The policy gradient theorem is defined based on an objective with respect to the initial distribution over states. In the discounted case, this results in policies that are optimal for one distribution over initial states, but may not be uniformly optimal for others, no matter where the agent starts from. Furthermore, to obtain unbiased gradient estimates, the starting point of the policy gradient estimator requires sampling states from a normalized discounted weighting of states. However, the difficulty of estimating the normalized discounted weighting of states, or the stationary state distribution, is quite well-known. Additionally, the large sample complexity of policy gradient methods is often attributed to insufficient exploration, and to remedy this, it is often assumed that the restart distribution provides sufficient exploration in these algorithms. In this work, we propose exploration in policy gradient methods based on maximizing entropy of the discounted future state distribution. The key contribution of our work includes providing a practically feasible algorithm to estimate the normalized discounted weighting of states, i.e, the \textit{discounted future state distribution}. We propose that exploration can be achieved by entropy regularization with the discounted state distribution in policy gradients, where a metric for maximal coverage of the state space can be based on the entropy of the induced state distribution. The proposed approach can be considered as a three time-scale algorithm and under some mild technical conditions, we prove its convergence to a locally optimal policy. Experimentally, we demonstrate usefulness of regularization with the discounted future state distribution in terms of increased state space coverage and faster learning on a range of complex tasks.

algorithm, regularization, state distribution, (13 more...)

arXiv.org Artificial Intelligence

1912.05104

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > China > Beijing > Beijing (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(7 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

r/MachineLearning - [R] Reinforcement Learning for Market Making in a Multi-agent Dealer Market (JPMorgan)

#artificialintelligenceDec-9-2019, 04:18:26 GMT

Abstract: Market makers play an important role in providing liquidity to markets by continuously quoting prices at which they are willing to buy and sell, and managing inventory risk. In this paper, we build a multi-agent simulation of a dealer market and demonstrate that it can be used to understand the behavior of a reinforcement learning (RL) based market maker agent. We use the simulator to train an RL-based market maker agent with different competitive scenarios, reward formulations and market price trends (drifts). We show that the reinforcement learning agent is able to learn about its competitor's pricing policy; it also learns to manage inventory by smartly selecting asymmetric prices on the buy and sell sides (skewing), and maintaining a positive (or negative) inventory depending on whether the market price drift is positive (or negative). Finally, we propose and test reward formulations for creating risk averse RL-based market maker agents.

market maker agent, multi-agent dealer market, reinforcement learning, (5 more...)

#artificialintelligence

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback