Goto

Collaborating Authors

 Reinforcement Learning


When to use parametric models in reinforcement learning?

arXiv.org Artificial Intelligence

We examine the question of when and how parametric models are most useful in reinforcement learning. In particular, we look at commonalities and differences between parametric models and experience replay. Replay-based learning algorithms share important traits with model-based approaches, including the ability to plan: to use more computation without additional data to improve predictions and behaviour. We discuss when to expect benefits from either approach, and interpret prior work in this context. We hypothesise that, under suitable conditions, replay-based algorithms should be competitive to or better than model-based algorithms if the model is used only to generate fictional transitions from observed states for an update rule that is otherwise model-free. We validated this hypothesis on Atari 2600 video games. The replay-based algorithm attained state-of-the-art data efficiency, improving over prior results with parametric models.


Fast Task Inference with Variational Intrinsic Successor Features

arXiv.org Artificial Intelligence

It has been established that diverse behaviors spanning the controllable subspace of an Markov decision process can be trained by rewarding a policy for being distinguishable from other policies \citep{gregor2016variational, eysenbach2018diversity, warde2018unsupervised}. However, one limitation of this formulation is generalizing behaviors beyond the finite set being explicitly learned, as is needed for use on subsequent tasks. Successor features \citep{dayan93improving, barreto2017successor} provide an appealing solution to this generalization problem, but require defining the reward function as linear in some grounded feature space. In this paper, we show that these two techniques can be combined, and that each method solves the other's primary limitation. To do so we introduce Variational Intrinsic Successor FeatuRes (VISR), a novel algorithm which learns controllable features that can be leveraged to provide enhanced generalization and fast task inference through the successor feature framework. We empirically validate VISR on the full Atari suite, in a novel setup wherein the rewards are only exposed briefly after a long unsupervised phase. Achieving human-level performance on 14 games and beating all baselines, we believe VISR represents a step towards agents that rapidly learn from limited feedback.


r/MachineLearning - [R] Reinforcement Learning in Non-Stationary Environments

#artificialintelligence

I don't think this is as significant of a problem as you think it is. If I understood the paper correctly, the problem it actually solves is much smaller than the problem stated. It only applies to non-stationary environments where only one copy of the environment is available. If multiple copies of the environment are available, then I would wager that standard reinforcement learning techniques will far outperform this. Furthermore, reinforcement learning with only one copy of the environment (in other words, non-parallel) has proven to be inadequate for real problems because it is impossible to get enough data.


Beyond DQN/A3C: A Survey in Advanced Reinforcement Learning

#artificialintelligence

One of my favorite things about deep reinforcement learning is that, unlike supervised learning, it really, really doesn't want to work. Throwing a neural net at a computer vision problem might get you 80% of the way there. Throwing a neural net at an RL problem will probably blow something up in front of your face -- and it will blow up in a different way each time you try. A lot of the biggest challenges in RL revolve around two questions: how we interact with the environment effectively (e.g. In this post, I want to explore a few recent directions in deep RL research that attempt to address these challenges, and do so with particularly elegant parallels to human cognition. This post will begin with a quick review of two canonical deep RL algorithms -- DQN and A3C -- to provide us some intuitions to refer back to, and then jump into a deep dive on a few recent papers and breakthroughs in the categories described above.


Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Recent developments in deep reinforcement learning are concerned with creating decision-making agents which can perform well in various complex domains. A particular approach which has received increasing attention is multi-agent reinforcement learning, in which multiple agents learn concurrently to coordinate their actions. In such multi-agent environments, additional learning problems arise due to the continually changing decision-making policies of agents. This paper surveys recent works that address the non-stationarity problem in multi-agent deep reinforcement learning. The surveyed methods range from modifications in the training procedure, such as centralized training, to learning representations of the opponent's policy, meta-learning, communication, and decentralized learning. The survey concludes with a list of open problems and possible lines of future research.


"Did You Hear That?" Learning to Play Video Games from Audio Cues

arXiv.org Artificial Intelligence

Game-playing AI research has focused for a long time on learning to play video games from visual input or symbolic information. However, humans benefit from a wider array of sensors which we utilise in order to navigate the world around us. In particular, sounds and music are key to how many of us perceive the world and influence the decisions we make. In this paper, we present initial experiments on game-playing agents learning to play video games solely from audio cues. We expand the Video Game Description Language to allow for audio specification, and the General Video Game AI framework to provide new audio games and an API for learning agents to make use of audio observations. We analyse the games and the audio game design process, include initial results with simple Q~Learning agents, and encourage further research in this area.


Causal Discovery with Reinforcement Learning

arXiv.org Machine Learning

Discovering causal structure among a set of variables is a fundamental problem in many empirical sciences. Traditional score-based casual discovery methods rely on various local heuristics to search for a directly acyclic graph (DAG) according to a predefined score function. While these methods, e.g., greedy equivalence search (GES), may have attractive results with infinite samples and certain model assumptions, they are less satisfactory in practice due to finite data and possible violation of assumptions. Motivated by recent advances in neural combinatorial optimization, we propose to use reinforcement learning (RL) to search for the DAG with the best scoring. Our encoder-decoder model takes observable data as input and generates graph adjacency matrices that are used to compute corresponding rewards. The reward incorporates both the predefined score function and two penalty terms for enforcing acyclicity. In contrast with typical RL applications where the goal is to learn a policy, we use RL as a search strategy and our final output would be the graph, among all graphs generated during training, that achieves the best reward. We conduct experiments on both synthetic and real data, and show that the proposed approach not only has an improved search ability but also allows for a flexible score function under the acyclicity constraint.


Provably Efficient Imitation Learning from Observation Alone

arXiv.org Machine Learning

We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner. FAIL is the first provably efficient algorithm in ILFO setting, which learns a near-optimal policy with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The resulting theory extends the domain of provably sample efficient learning algorithms beyond existing results, which typically only consider tabular reinforcement learning settings or settings that require access to a near-optimal reset distribution. We also investigate the extension of FAIL in a model-based setting. Finally we demonstrate the efficacy of FAIL on multiple OpenAI Gym control tasks.


Variance-reduced $Q$-learning is minimax optimal

arXiv.org Machine Learning

Markov decision processes and reinforcement learning algorithms provide a flexible framework for decision-making in dynamic settings, and have been studied for decades (e.g., [23, 27, 8, 9, 29]). Given the explosion in the amount of available data and computing power, recent years have witnessed dramatic success of reinforcement learning (RL) techniques in various application domains (e.g., [30, 19, 26, 22, 27]). In broad terms, algorithms for reinforcement learning are often separated into model-based versus model-free approaches. Model-based approaches based on directly learning a model for the dynamics of the system, and then computing optimal policies from the learned model. In contrast, a model-free approach directly targets learning of the optimal value function or policy. Naturally, a model-free approach is more robust to model mismatch; however, model-based approaches can often be more sample efficient. Providing a firm theoretical foundation to the tradeoffs intrinsic to different classes of methods, as characterized by their access to the underlying Markov decision process, is a major open question in RL.


Survey of Artificial Intelligence for Card Games and Its Application to the Swiss Game Jass

arXiv.org Artificial Intelligence

In the last decades we have witnessed the success of applications of Artificial Intelligence to playing games. In this work we address the challenging field of games with hidden information and card games in particular. Jass is a very popular card game in Switzerland and is closely connected with Swiss culture. To the best of our knowledge, performances of Artificial Intelligence agents in the game of Jass do not outperform top players yet. Our contribution to the community is two-fold. First, we provide an overview of the current state-of-the-art of Artificial Intelligence methods for card games in general. Second, we discuss their application to the use-case of the Swiss card game Jass. This paper aims to be an entry point for both seasoned researchers and new practitioners who want to join in the Jass challenge.