AITopics | Geist, Matthieu

Collaborating Authors

Geist, Matthieu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MULEX: Disentangling Exploitation from Exploration in Deep RL

Beyer, Lucas, Vincent, Damien, Teboul, Olivier, Gelly, Sylvain, Geist, Matthieu, Pietquin, Olivier

arXiv.org Artificial IntelligenceJul-1-2019

An agent learning through interactions should balance its action selection process between probing the environment to discover new rewards and using the information acquired in the past to adopt useful behaviour. This trade-off is usually obtained by perturbing either the agent's actions (e.g., e-greedy or Gibbs sampling) or the agent's parameters (e.g., NoisyNet), or by modifying the reward it receives (e.g., exploration bonus, intrinsic motivation, or hand-shaped rewards). Here, we adopt a disruptive but simple and generic perspective, where we explicitly disentangle exploration and exploitation. Different losses are optimized in parallel, one of them coming from the true objective (maximizing cumulative rewards from the environment) and others being related to exploration. Every loss is used in turn to learn a policy that generates transitions, all shared in a single replay buffer. Off-policy methods are then applied to these transitions to optimize each loss. We showcase our approach on a hard-exploration environment, show its sample-efficiency and robustness, and discuss further implications.

artificial intelligence, exploration, upstream oil & gas, (18 more...)

arXiv.org Artificial Intelligence

1907.00868

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep Conservative Policy Iteration

Vieillard, Nino, Pietquin, Olivier, Geist, Matthieu

arXiv.org Machine LearningJun-24-2019

Conservative Policy Iteration (CPI) is a founding algorithm of Approximate Dynamic Programming (ADP). Its core principle is to stabilize greediness through stochastic mixtures of consecutive policies. It comes with strong theoretical guarantees, and inspired approaches in deep Reinforcement Learning (RL). However, CPI itself has rarely been implemented, never with neural networks, and only experimented on toy problems. In this paper, we show how CPI can be practically combined with deep RL with discrete actions. We also introduce adaptive mixture rates inspired by the theory. We experiment thoroughly the resulting algorithm on the simple Cartpole problem, and validate the proposed method on a representative subset of Atari games. Overall, this work suggests that revisiting classic ADP may lead to improved and more stable deep RL algorithms.

algorithm, artificial intelligence, computer game, (16 more...)

arXiv.org Machine Learning

1906.09784

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Foolproof Cooperative Learning

Jacq, Alexis, Perolat, Julien, Geist, Matthieu, Pietquin, Olivier

arXiv.org Artificial IntelligenceJun-24-2019

This paper extends the notion of equilibrium in game theory to learning algorithms in repeated stochastic games. We define a learning equilibrium as an algorithm used by a population of players, such that no player can individually use an alternative algorithm and increase its asymptotic score. We introduce Foolproof Cooperative Learning (FCL), an algorithm that converges to a Tit-for-Tat behavior. It allows cooperative strategies when played against itself while being not exploitable by selfish players. We prove that in repeated symmetric games, this algorithm is a learning equilibrium. We illustrate the behavior of FCL on symmetric matrix and grid games, and its robustness to selfish learners.

artificial intelligence, equilibrium, game theory, (21 more...)

arXiv.org Artificial Intelligence

1906.09831

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.74)

Add feedback

Targeted Attacks on Deep Reinforcement Learning Agents through Adversarial Observations

Hussenot, Léonard, Geist, Matthieu, Pietquin, Olivier

arXiv.org Machine LearningMay-29-2019

While previous approaches perform untargeted attacks on the state of the agent, we propose a method to perform targeted attacks to lure an agent into consistently following a desired policy. We place ourselves in a realistic setting, where attacks are performed on observations of the environment rather than the internal state of the agent and develop constant attacks instead of per-observation ones. We illustrate our method by attacking deep RL agents playing Atari games and show that universal additive masks can be applied not only to degrade performance but to take control of an agent.

agent, artificial intelligence, computer game, (15 more...)

arXiv.org Machine Learning

1905.12282

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Theory of Regularized Markov Decision Processes

Geist, Matthieu, Scherrer, Bruno, Pietquin, Olivier

arXiv.org Machine LearningJan-31-2019

Many recent successful (deep) reinforcement learning algorithms make use of regularization, generally based on entropy or on Kullback-Leibler divergence. We propose a general theory of regularized Markov Decision Processes that generalizes these approaches in two directions: we consider a larger class of regularizers, and we consider the general modified policy iteration approach, encompassing both policy iteration and value iteration. The core building blocks of this theory are a notion of regularized Bellman operator and the Legendre-Fenchel transform, a classical tool of convex optimization. This approach allows for error propagation analyses of general algorithmic schemes of which (possibly variants of) classical algorithms such as Trust Region Policy Optimization, Soft Q-learning, Stochastic Actor Critic or Dynamic Policy Programming are special cases. This also draws connections to proximal convex optimization, especially to Mirror Descent.

artificial intelligence, operator, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1901.11275

Country: Europe > France (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.60)

Add feedback

Anderson Acceleration for Reinforcement Learning

Geist, Matthieu, Scherrer, Bruno

arXiv.org Machine LearningSep-25-2018

Anderson (1965) acceleration is an old and simple method for accelerating the computation of a fixed point. However, as far as we know and quite surprisingly, it has never been applied to dynamic programming or reinforcement learning. In this paper, we explain briefly what Anderson acceleration is and how it can be applied to value iteration, this being supported by preliminary experiments showing a significant speed up of convergence, that we critically discuss. We also discuss how this idea could be applied more generally to (deep) reinforcement learning. Keywords: Reinforcement learning; accelerated fixed point.

anderson acceleration, artificial intelligence, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1809.09501

Country: Europe > France (0.29)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Human Activity Recognition using Recurrent Neural Networks

Singh, Deepika, Merdivan, Erinc, Psychoula, Ismini, Kropf, Johannes, Hanke, Sten, Geist, Matthieu, Holzinger, Andreas

arXiv.org Machine LearningApr-19-2018

Human activity recognition using smart home sensors is one of the bases of ubiquitous computing in smart environments and a topic undergoing intense research in the field of ambient assisted living. The increasingly large amount of data sets calls for machine learning methods. In this paper, we introduce a deep learning model that learns to classify human activities without using any prior knowledge. For this purpose, a Long Short Term Memory (LSTM) Recurrent Neural Network was applied to three real world smart home datasets. The results of these experiments show that the proposed approach outperforms the existing ones in terms of accuracy and performance.

activity recognition, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

doi: 10.1007/978-3-319-66808-6_18

1804.07144

Country: Europe > Austria (0.14)

Genre: Research Report (0.70)

Industry:

Information Technology > Smart Houses & Appliances (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.70)

Add feedback

Reconstruct & Crush Network

Merdivan, Erinc, Loghmani, Mohammad Reza, Geist, Matthieu

Neural Information Processing SystemsDec-31-2017

This article introduces an energy-based model that is adversarial regarding data: it minimizes the energy for a given data distribution (the positive samples) while maximizing the energy for another given data distribution (the negative or unlabeled samples). The model is especially instantiated with autoencoders where the energy, represented by the reconstruction error, provides a general distance measure for unknown data. The resulting neural network thus learns to reconstruct data from the first distribution while crushing data from the second distribution. This solution can handle different problems such as Positive and Unlabeled (PU) learning or covariate shift, especially with imbalanced data. Using autoencoders allows handling a large variety of data, such as images, text or even dialogues. Our experiments show the flexibility of the proposed approach in dealing with different types of data in different settings: images with CIFAR-10 and CIFAR-100 (not-in-training setting), text with Amazon reviews (PU learning) and dialogues with Facebook bAbI (next response classification and dialogue completion).

deep learning, neural network, reconstruction error, (18 more...)

Neural Information Processing Systems

Country: Europe > Austria > Vienna (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Is the Bellman residual a bad proxy?

Geist, Matthieu, Piot, Bilal, Pietquin, Olivier

Neural Information Processing SystemsDec-31-2017

This paper aims at theoretically and empirically comparing two standard optimization criteria for Reinforcement Learning: i) maximization of the mean value and ii) minimization of the Bellman residual. For that purpose, we place ourselves in the framework of policy search algorithms, that are usually designed to maximize the mean value, and derive a method that minimizes the residual $\|T_* v_\pi - v_\pi\|_{1,\nu}$ over policies. A theoretical analysis shows how good this proxy is to policy optimization, and notably that it is better than its value-based counterpart. We also propose experiments on randomly generated generic Markov decision processes, specifically designed for studying the influence of the involved concentrability coefficient. They show that the Bellman residual is generally a bad proxy to policy optimization and that directly maximizing the mean value is much better, despite the current lack of deep theoretical analysis. This might seem obvious, as directly addressing the problem of interest is usually better, but given the prevalence of (projected) Bellman residual minimization in value-based reinforcement learning, we believe that this question is worth to be considered.

artificial intelligence, concentrability coefficient, optimization problem, (14 more...)

Neural Information Processing Systems

Country: