AITopics

arXiv.org Artificial IntelligenceOct-15-2022

Emergent Communication: Generalization and Overfitting in Lewis Games

Rita, Mathieu, Tallec, Corentin, Michel, Paul, Grill, Jean-Bastien, Pietquin, Olivier, Dupoux, Emmanuel, Strub, Florian

Lewis signaling games are a class of simple communication games for simulating the emergence of language. In these games, two agents must agree on a communication protocol in order to solve a cooperative task. Previous work has shown that agents trained to play this game with reinforcement learning tend to develop languages that display undesirable properties from a linguistic point of view (lack of generalization, lack of compositionality, etc). In this paper, we aim to provide better understanding of this phenomenon by analytically studying the learning problem in Lewis games. As a core contribution, we demonstrate that the standard objective in Lewis games can be decomposed in two components: a co-adaptation loss and an information loss. This decomposition enables us to surface two potential sources of overfitting, which we show may undermine the emergence of a structured communication protocol. In particular, when we control for overfitting on the co-adaptation loss, we recover desired properties in the emergent languages: they are more compositional and generalize better.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2209.15342

Country: North America > United States (0.68)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.67)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Machine LearningOct-20-2020

BYOL works even without batch statistics

Richemond, Pierre H., Grill, Jean-Bastien, Altché, Florent, Tallec, Corentin, Strub, Florian, Brock, Andrew, Smith, Samuel, De, Soham, Pascanu, Razvan, Piot, Bilal, Valko, Michal

Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation. From an augmented view of an image, BYOL trains an online network to predict a target network representation of a different augmented view of the same image. Unlike contrastive methods, BYOL does not explicitly use a repulsion term built from negative pairs in its training objective. Yet, it avoids collapse to a trivial, constant representation. Thus, it has recently been hypothesized that batch normalization (BN) is critical to prevent collapse in BYOL. Indeed, BN flows gradients across batch elements, and could leak information about negative views in the batch, which could act as an implicit negative (contrastive) term. However, we experimentally show that replacing BN with a batch-independent normalization scheme (namely, a combination of group normalization and weight standardization) achieves performance comparable to vanilla BYOL ($73.9\%$ vs. $74.3\%$ top-1 accuracy under the linear evaluation protocol on ImageNet with ResNet-$50$). Our finding disproves the hypothesis that the use of batch statistics is a crucial ingredient for BYOL to learn useful representations.

deep learning, neural network, representation, (14 more...)

2010.10241

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Machine LearningSep-10-2020

Bootstrap your own latent: A new approach to self-supervised Learning

Grill, Jean-Bastien, Strub, Florian, Altché, Florent, Tallec, Corentin, Richemond, Pierre H., Buchatskaya, Elena, Doersch, Carl, Pires, Bernardo Avila, Guo, Zhaohan Daniel, Azar, Mohammad Gheshlaghi, Piot, Bilal, Kavukcuoglu, Koray, Munos, Rémi, Valko, Michal

We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the same time, we update the target network with a slow-moving average of the online network. While state-of-the art methods rely on negative pairs, BYOL achieves a new state of the art without them. BYOL reaches $74.3\%$ top-1 classification accuracy on ImageNet using a linear evaluation with a ResNet-50 architecture and $79.6\%$ with a larger ResNet. We show that BYOL performs on par or better than the current state of the art on both transfer and semi-supervised benchmarks. Our implementation and pretrained models are given on GitHub.

deep learning, neural network, representation, (20 more...)

2006.07733

Country:

Oceania > Australia (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

arXiv.org Machine LearningJul-24-2020

Monte-Carlo Tree Search as Regularized Policy Optimization

Grill, Jean-Bastien, Altché, Florent, Tang, Yunhao, Hubert, Thomas, Valko, Michal, Antonoglou, Ioannis, Munos, Rémi

The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence. However, AlphaZero, the current state-of-the-art MCTS algorithm, still relies on handcrafted heuristics that are only partially understood. In this paper, we show that AlphaZero's search heuristics, along with other common ones such as UCT, are an approximation to the solution of a specific regularized policy optimization problem. With this insight, we propose a variant of AlphaZero which uses the exact solution to this policy optimization problem, and show experimentally that it reliably outperforms the original algorithm in multiple domains.

muzero, neural network, planning & scheduling, (17 more...)

2007.12509

Country:

Europe > United Kingdom > England (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningJan-15-2019

Optimistic optimization of a Brownian

Grill, Jean-Bastien, Valko, Michal, Munos, Rémi

We address the problem of optimizing a Brownian motion. We consider a (random) realization $W$ of a Brownian motion with input space in $[0,1]$. Given $W$, our goal is to return an $\epsilon$-approximation of its maximum using the smallest possible number of function evaluations, the sample complexity of the algorithm. We provide an algorithm with sample complexity of order $\log^2(1/\epsilon)$. This improves over previous results of Al-Mharmah and Calvin (1996) and Calvin et al. (2017) which provided only polynomial rates. Our algorithm is adaptive---each query depends on previous values---and is an instance of the optimism-in-the-face-of-uncertainty principle.

artificial intelligence, brownian motion, machine learning, (17 more...)

1901.04884

Country: Europe > France (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)
Information Technology > Artificial Intelligence > Machine Learning (0.46)

Neural Information Processing SystemsDec-31-2018

Optimistic optimization of a Brownian

Grill, Jean-Bastien, Valko, Michal, Munos, Remi

artificial intelligence, brownian motion, machine learning, (17 more...)

Country: Europe > France (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)
Information Technology > Artificial Intelligence > Machine Learning (0.46)

Neural Information Processing SystemsDec-31-2018

Optimistic optimization of a Brownian

Grill, Jean-Bastien, Valko, Michal, Munos, Remi

artificial intelligence, brownian motion, machine learning, (17 more...)

Country: Europe > France (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)
Information Technology > Artificial Intelligence > Machine Learning (0.46)

Neural Information Processing SystemsDec-31-2016

Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning

Grill, Jean-Bastien, Valko, Michal, Munos, Remi

We study the sampling-based planning problem in Markov decision processes (MDPs) that we can access only through a generative model, usually referred to as Monte-Carlo planning. Our objective is to return a good estimate of the optimal value function at any state while minimizing the number of calls to the generative model, i.e. the sample complexity. We propose a new algorithm, TrailBlazer, able to handle MDPs with a finite or an infinite number of transitions from state-action to next states. TrailBlazer is an adaptive algorithm that exploits possible structures of the MDP by exploring only a subset of states reachable by following near-optimal policies. We provide bounds on its sample complexity that depend on a measure of the quantity of near-optimal states. The algorithm behavior can be considered as an extension of Monte-Carlo sampling (for estimating an expectation) to problems that alternate maximization (over actions) and expectation (over next states). Finally, another appealing feature of TrailBlazer is that it is simple to implement and computationally efficient.

artificial intelligence, node, planning & scheduling, (17 more...)

Country: Europe > France (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Neural Information Processing SystemsDec-31-2015

Black-box optimization of noisy functions with unknown smoothness

Grill, Jean-Bastien, Valko, Michal, Munos, Remi, Munos, Remi

We study the problem of black-box optimization of a function $f$ of any dimension, given function evaluations perturbed by noise. The function is assumed to be locally smooth around one of its global optima, but this smoothness is unknown. Our contribution is an adaptive optimization algorithm, POO or parallel optimistic optimization, that is able to deal with this setting. POO performs almost as well as the best known algorithms requiring the knowledge of the smoothness. Furthermore, POO works for a larger class of functions than what was previously considered, especially for functions that are difficult to optimize, in a very precise sense. We provide a finite-time analysis of POO's performance, which shows that its error after $n$ evaluations is at most a factor of $\sqrt{\ln n}$ away from the error of the best known optimization algorithms using the knowledge of the smoothness.

air transportation, algorithm, optimization problem, (18 more...)

Country: Europe > France (0.28)

Industry: Transportation > Air (0.60)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)