AITopics | Geist, Matthieu

Collaborating Authors

Geist, Matthieu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications

Perrin, Sarah, Perolat, Julien, Laurière, Mathieu, Geist, Matthieu, Elie, Romuald, Pietquin, Olivier

arXiv.org Artificial IntelligenceJul-5-2020

In this paper, we deepen the analysis of continuous time Fictitious Play learning algorithm to the consideration of various finite state Mean Field Game settings (finite horizon, $\gamma$-discounted), allowing in particular for the introduction of an additional common noise. We first present a theoretical convergence analysis of the continuous time Fictitious Play process and prove that the induced exploitability decreases at a rate $O(\frac{1}{t})$. Such analysis emphasizes the use of exploitability as a relevant metric for evaluating the convergence towards a Nash equilibrium in the context of Mean Field Games. These theoretical contributions are supported by numerical experiments provided in either model-based or model-free settings. We provide hereby for the first time converging learning dynamics for Mean Field Games in the presence of common noise.

deep learning, field game, game theory, (18 more...)

arXiv.org Artificial Intelligence

2007.03458

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Energy (1.00)
Information Technology (0.68)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Show me the Way: Intrinsic Motivation from Demonstrations

Hussenot, Léonard, Dadashi, Robert, Geist, Matthieu, Pietquin, Olivier

arXiv.org Machine LearningJun-23-2020

The study of exploration in Reinforcement Learning (RL) has a long history but it remains an unsolved problem. Recent approaches applied to Deep RL are based on the concept of intrinsic motivation and are implemented in the shape of an exploration bonus, added to the environment reward, that encourages visiting exhaustively the whole state-action space as fast as possible. This approach is supported by the vast theory of RL for which convergence to optimality assumes exhaustive exploration. Yet, Human Beings and mammals do not exhaustively explore the world and their motivation is not only based on novelty but also on diverse other factors (e.g., curiosity, fun, style, pleasure, safety, competition, etc.). They optimize for life-long learning and train to learn transferable skills in playgrounds without obvious goals. They also apply innate or learned priors to save time and stay safe. For these reasons, we propose a method for learning an exploration bonus from demonstrations that could transfer these motivations to an artificial agent without explicitly modeling them. Using an inverse RL approach, we show that different exploration behaviors can be learnt and efficiently used by RL agents to solve tasks for which exhaustive exploration is prohibitive.

deep learning, demonstrator, neural network, (19 more...)

arXiv.org Machine Learning

2006.12917

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games (0.68)
Education > Educational Setting > Continuing Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study

Andrychowicz, Marcin, Raichuk, Anton, Stańczyk, Piotr, Orsini, Manu, Girgin, Sertan, Marinier, Raphael, Hussenot, Léonard, Geist, Matthieu, Pietquin, Olivier, Michalski, Marcin, Gelly, Sylvain, Bachem, Olivier

arXiv.org Machine LearningJun-10-2020

In recent years, on-policy reinforcement learning (RL) has been successfully applied to many different continuous control tasks. While RL algorithms are often conceptually simple, their state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents. Those choices are usually not extensively discussed in the literature, leading to discrepancy between published descriptions of algorithms and their implementations. This makes it hard to attribute progress in RL and slows down overall progress [Engstrom'20]. As a step towards filling that gap, we implement >50 such ``choices'' in a unified on-policy RL framework, allowing us to investigate their impact in a large-scale empirical study. We train over 250'000 agents in five continuous control environments of different complexity and provide insights and practical recommendations for on-policy training of RL agents.

artificial intelligence, neural network, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2006.0599

Country: North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Primal Wasserstein Imitation Learning

Dadashi, Robert, Hussenot, Léonard, Geist, Matthieu, Pietquin, Olivier

arXiv.org Machine LearningJun-8-2020

Reinforcement Learning (RL) has solved a number of difficult tasks whether in games [Tesauro, 1995, Mnih et al., 2015, Silver et al., 2016] or robotics [Abbeel and Ng, 2004, Andrychowicz et al., 2020]. However, RL relies on the existence of a reward function, that can be either hard to specify or too sparse to be used in practice. Imitation Learning (IL) is a paradigm that applies to these environments with hard to specify rewards: we seek to solve a task by learning a policy from a fixed number of demonstrations generated by an expert. IL methods can typically be folded into two paradigms: Behavioral Cloning [Pomerleau, 1991, Bagnell et al., 2007, Ross and Bagnell, 2010] and Inverse Reinforcement Learning [Russell, 1998, Ng et al., 2000]. In Behavioral Cloning, we seek to recover the expert's behavior by directly learning a policy that matches the expert behavior in some sense. In Inverse Reinforcement Learning (IRL), we assume that the demonstrations come from an agent that acts optimally with respect to an unknown reward function that we seek to recover, to subsequently train an agent on it. Although IRL methods introduce an intermediary problem to solve (i.e.

artificial intelligence, demonstration, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2006.04678

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Stable and Efficient Policy Evaluation

Lyu, Daoming, Liu, Bo, Geist, Matthieu, Dong, Wen, Biaz, Saad, Wang, Qi

arXiv.org Machine LearningJun-6-2020

Policy evaluation algorithms are essential to reinforcement learning due to their ability to predict the performance of a policy. However, there are two long-standing issues lying in this prediction problem that need to be tackled: off-policy stability and on-policy efficiency. The conventional temporal difference (TD) algorithm is known to perform very well in the on-policy setting, yet is not off-policy stable. On the other hand, the gradient TD and emphatic TD algorithms are off-policy stable, but are not on-policy efficient. This paper introduces novel algorithms that are both off-policy stable and on-policy efficient by using the oblique projection method. The empirical experimental results on various domains validate the effectiveness of the proposed approach.

algorithm, artificial intelligence, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2006.03978

Country:

North America > United States > New York (0.14)
North America > United States > Massachusetts (0.14)
North America > Canada > Alberta (0.14)
Europe > France > Grand Est (0.14)

Genre: Research Report (1.00)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

On Connections between Constrained Optimization and Reinforcement Learning

Vieillard, Nino, Pietquin, Olivier, Geist, Matthieu

arXiv.org Machine LearningOct-29-2019

Dynamic Programming (DP) provides standard algorithms to solve Markov Decision Processes. However, these algorithms generally do not optimize a scalar objective function. In this paper, we draw connections between DP and (constrained) convex optimization. Specifically, we show clear links in the algorithmic structure between three DP schemes and optimization algorithms. We link Conservative Policy Iteration to Frank-Wolfe, Mirror-Descent Modified Policy Iteration to Mirror Descent, and Politex (Policy Iteration Using Expert Prediction) to Dual Averaging. These abstract DP schemes are representative of a number of (deep) Reinforcement Learning (RL) algorithms. By highlighting these connections (most of which have been noticed earlier, but in a scattered way), we would like to encourage further studies linking RL and convex optimization, that could lead to the design of new, more efficient, and better understood RL algorithms.

algorithm, artificial intelligence, optimization problem, (18 more...)

arXiv.org Machine Learning

1910.08476

Country: North America (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

Momentum in Reinforcement Learning

Vieillard, Nino, Scherrer, Bruno, Pietquin, Olivier, Geist, Matthieu

arXiv.org Machine LearningOct-21-2019

We adapt the optimization's concept of momentum to reinforcement learning. Seeing the state-action value functions as an analog to the gradients in optimization, we interpret momentum as an average of consecutive $q$-functions. We derive Momentum Value Iteration (MoVI), a variation of Value Iteration that incorporates this momentum idea. Our analysis shows that this allows MoVI to average errors over successive iterations. We show that the proposed approach can be readily extended to deep learning. Specifically, we propose a simple improvement on DQN based on MoVI, and experiment it on Atari games.

computer game, deep learning, null, (21 more...)

arXiv.org Machine Learning

1910.09322

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Credit Assignment as a Proxy for Transfer in Reinforcement Learning

Ferret, Johan, Marinier, Raphaël, Geist, Matthieu, Pietquin, Olivier

arXiv.org Artificial IntelligenceJul-18-2019

The ability to transfer representations to novel environments and tasks is a sensible requirement for general learning agents. Despite the apparent promises, transfer in Reinforcement Learning is still an open and under-exploited research area. In this paper, we suggest that credit assignment, regarded as a supervised learning task, could be used to accomplish transfer. Our contribution is twofold: we introduce a new credit assignment mechanism based on self-attention, and show that the learned credit can be transferred to in-domain and out-of-domain scenarios.

agent, artificial intelligence, neural network, (17 more...)

arXiv.org Artificial Intelligence

1907.08027

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Approximate Fictitious Play for Mean Field Games

Elie, Romuald, Pérolat, Julien, Laurière, Mathieu, Geist, Matthieu, Pietquin, Olivier

arXiv.org Machine LearningJul-4-2019

The theory of Mean Field Games (MFG) allows characterizing the Nash equilibria of an infinite number of identical players, and provides a convenient and relevant mathematical framework for the study of games with a large number of agents in interaction. Until very recently, the literature only considered Nash equilibria between fully informed players. In this paper, we focus on the realistic setting where agents with no prior information on the game learn their best response policy through repeated experience. We study the convergence to a (possibly approximate) Nash equilibrium of a fictitious play iterative learning scheme where the best response is approximately computed, typically by a reinforcement learning (RL) algorithm. Notably, we show for the first time convergence of model free learning algorithms towards non-stationary MFG equilibria, relying only on classical assumptions on the MFG dynamics. We illustrate our theoretical results with a numerical experiment in continuous action-space setting, where the best response of the iterative fictitious play scheme is computed with a deep RL algorithm.

artificial intelligence, game theory, nash equilibrium, (18 more...)

arXiv.org Machine Learning

1907.02633

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Modified Actor-Critics

Merdivan, Erinc, Hanke, Sten, Geist, Matthieu

arXiv.org Artificial IntelligenceJul-2-2019

Robot Learning, from a control point of view, often involves continuous actions. In Reinforcement Learning, such actions are usually handled with actor-critic algorithms. They may build on Conservative Policy Iteration (e.g., Trust Region Policy Optimization, TRPO), on policy gradient (e.g., Reinforce), on entropy regularization (e.g., Soft Actor Critic, SAC), among others (e.g., Proximal Policy Optimization, PPO), but in all cases they can be seen as a form of soft policy iteration: they iterate policy evaluation followed by a soft policy improvement step. As so, they often are naturally on-policy. In this paper, we propose to combine (any kind of) soft greediness with Modified Policy Iteration (MPI). The proposed abstract framework applies repeatedly: (i) a partial policy evaluation step that allows off-policy learning and (ii) any soft greedy step. As a proof of concept, we instantiate this framework with the PPO soft greediness. Comparison to the original PPO shows that our algorithm is much more sample efficient. We also show that it is competitive with the state-of-art off-policy algorithm SAC.

artificial intelligence, policy iteration, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1907.01298

Country: Europe > Austria (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback