AITopics | Strub, Florian

Plotting

Strub, Florian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BYOL works even without batch statistics

Richemond, Pierre H., Grill, Jean-Bastien, Altché, Florent, Tallec, Corentin, Strub, Florian, Brock, Andrew, Smith, Samuel, De, Soham, Pascanu, Razvan, Piot, Bilal, Valko, Michal

arXiv.org Machine LearningOct-20-2020

Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation. From an augmented view of an image, BYOL trains an online network to predict a target network representation of a different augmented view of the same image. Unlike contrastive methods, BYOL does not explicitly use a repulsion term built from negative pairs in its training objective. Yet, it avoids collapse to a trivial, constant representation. Thus, it has recently been hypothesized that batch normalization (BN) is critical to prevent collapse in BYOL. Indeed, BN flows gradients across batch elements, and could leak information about negative views in the batch, which could act as an implicit negative (contrastive) term. However, we experimentally show that replacing BN with a batch-independent normalization scheme (namely, a combination of group normalization and weight standardization) achieves performance comparable to vanilla BYOL ($73.9\%$ vs. $74.3\%$ top-1 accuracy under the linear evaluation protocol on ImageNet with ResNet-$50$). Our finding disproves the hypothesis that the use of batch statistics is a crucial ingredient for BYOL to learn useful representations.

deep learning, neural network, representation, (14 more...)

arXiv.org Machine Learning

2010.10241

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Bootstrap your own latent: A new approach to self-supervised Learning

Grill, Jean-Bastien, Strub, Florian, Altché, Florent, Tallec, Corentin, Richemond, Pierre H., Buchatskaya, Elena, Doersch, Carl, Pires, Bernardo Avila, Guo, Zhaohan Daniel, Azar, Mohammad Gheshlaghi, Piot, Bilal, Kavukcuoglu, Koray, Munos, Rémi, Valko, Michal

arXiv.org Machine LearningSep-10-2020

We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the same time, we update the target network with a slow-moving average of the online network. While state-of-the art methods rely on negative pairs, BYOL achieves a new state of the art without them. BYOL reaches $74.3\%$ top-1 classification accuracy on ImageNet using a linear evaluation with a ResNet-50 architecture and $79.6\%$ with a larger ResNet. We show that BYOL performs on par or better than the current state of the art on both transfer and semi-supervised benchmarks. Our implementation and pretrained models are given on GitHub.

deep learning, neural network, representation, (20 more...)

arXiv.org Machine Learning

2006.07733

Country:

Oceania > Australia (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

The Monte Carlo Transformer: a stochastic self-attention model for sequence prediction

Martin, Alice, Ollion, Charles, Strub, Florian, Corff, Sylvain Le, Pietquin, Olivier

arXiv.org Machine LearningJul-15-2020

While neural networks excel at predicting a single-point estimate of a given target for complex machine learning problems, an open research question is the design of neural generative models able to output a predictive distribution, that can capture the inherent variability of the observations or the model's level of confidence in its predictions. The main motivation behind uncertainty quantification is the design of AI systems for critical applications that are safe, and are mitigating risks while automatizing decision-making. On one hand, Bayesian statistics offer a mathematically grounded framework to reason about uncertainty; however, such models generally require prohibitive computational costs, which make them not widely used in practice. On the other hand, frequentist methods and metrics have been developed for confidence estimation in neural networks, in particular in the classification setting [Brosse et al., 2020], [Corbière et al., 2019].

deep learning, neural network, transformer, (19 more...)

arXiv.org Machine Learning

2007.0862

Country:

Europe > France (0.14)
North America > Canada (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Education (0.34)
Health & Medicine (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.94)

Add feedback

Self-Educated Language Agent With Hindsight Experience Replay For Instruction Following

Cideron, Geoffrey, Seurin, Mathieu, Strub, Florian, Pietquin, Olivier

arXiv.org Machine LearningOct-21-2019

Language creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality. These properties make it a natural fit to guide the training of interactive agents as it could ease recurrent challenges in Reinforcement Learning such as sample complexity, generalization, or multi-tasking. Yet, it remains an open-problem to relate language and RL in even simple instruction following scenarios. Current methods rely on expert demonstrations, auxiliary losses, or inductive biases in neural architectures. In this paper, we propose an orthogonal approach called Textual Hindsight Experience Replay (THER) that extends the Hindsight Experience Replay approach to the language setting. Whenever the agent does not fulfill its instruction, THER learns to output a new directive that matches the agent trajectory, and it relabels the episode with a positive reward. To do so, THER learns to map a state into an instruction by using past successful trajectories, which removes the need to have external expert interventions to relabel episodes as in vanilla HER. We observe that this simple idea also initiates a learning synergy between language acquisition and policy learning on instruction following tasks in the BabyAI environment.

deep learning, instruction, neural network, (17 more...)

arXiv.org Machine Learning

1910.09451

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)

Add feedback

Deep Reinforcement Learning and the Deadly Triad

van Hasselt, Hado, Doron, Yotam, Strub, Florian, Hessel, Matteo, Sonnerat, Nicolas, Modayil, Joseph

arXiv.org Artificial IntelligenceDec-6-2018

We know from reinforcement learning theory that temporal difference learning can fail in certain cases. Sutton and Barto (2018) identify a deadly triad of function approximation, bootstrapping, and off-policy learning. When these three properties are combined, learning can diverge with the value estimates becoming unbounded. However, several algorithms successfully combine these three properties, which indicates that there is at least a partial gap in our understanding. In this work, we investigate the impact of the deadly triad in practice, in the context of a family of popular deep reinforcement learning models - deep Q-networks trained with experience replay - analysing how the components of this system play a role in the emergence of the deadly triad, and in the agent's performance

artificial intelligence, divergence, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1812.02648

Country:

North America (0.68)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Sports (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Visual Reasoning with Multi-hop Feature Modulation

Strub, Florian, Seurin, Mathieu, Perez, Ethan, de Vries, Harm, Mary, Jérémie, Preux, Philippe, Courville, Aaron, Pietquin, Olivier

arXiv.org Machine LearningAug-3-2018

Recent breakthroughs in computer vision and natural language processing have spurred interest in challenging multi-modal tasks such as visual question-answering and visual dialogue. For such tasks, one successful approach is to condition image-based convolutional network computation on language via Feature-wise Linear Modulation (FiLM) layers, i.e., per-channel scaling and shifting. We propose to generate the parameters of FiLM layers going up the hierarchy of a convolutional network in a multi-hop fashion rather than all at once, as in prior work. By alternating between attending to the language input and generating FiLM layer parameters, this approach is better able to scale to settings with longer input sequences such as dialogue. We demonstrate that multi-hop FiLM generation achieves state-of-the-art for the short input sequence task ReferIt --- on-par with single-hop FiLM generation --- while also significantly outperforming prior state-of-the-art and single-hop FiLM generation on the GuessWhat?! visual dialogue task.

deep learning, neural network, reasoning, (20 more...)

arXiv.org Machine Learning

1808.04446

Country: North America > Canada (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

FiLM: Visual Reasoning with a General Conditioning Layer

Perez, Ethan (MILA, Universite de Montreal, Rice University <span style="font-size: 9.5pt) | Strub, Florian (font-family: Arial, sans) | Vries, Harm de (Univ. Lille, CNRS, Centrale Lille, Inria, UMR 9189 CRIStAL France) | Dumoulin, Vincent (MILA, Universite de Montreal) | Courville, Aaron (MILA, Universite de Montreal)

AAAI ConferencesFeb-8-2018

We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.

deep learning, neural network, reasoning, (18 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

North America (0.14)
Europe > France (0.14)

Genre: Research Report (0.68)

Industry:

Media > Film (0.69)
Leisure & Entertainment (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Modulating early visual processing by language

Vries, Harm de, Strub, Florian, Mary, Jeremie, Larochelle, Hugo, Pietquin, Olivier, Courville, Aaron C.

Neural Information Processing SystemsDec-31-2017

It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view dominates the current literature in computational models for language-vision tasks, where visual and linguistic inputs are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and propose to modulate the \emph{entire visual processing} by a linguistic input. Specifically, we introduce Conditional Batch Normalization (CBN) as an efficient mechanism to modulate convolutional feature maps by a linguistic embedding. We apply CBN to a pre-trained Residual Network (ResNet), leading to the MODulatEd ResNet (\MRN) architecture, and show that this significantly improves strong baselines on two visual question answering tasks. Our ablation study confirms that modulating from the early stages of the visual processing is beneficial.

deep learning, neural network, representation, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FiLM: Visual Reasoning with a General Conditioning Layer

Perez, Ethan, Strub, Florian, de Vries, Harm, Dumoulin, Vincent, Courville, Aaron

arXiv.org Machine LearningDec-18-2017

deep learning, neural network, reasoning, (19 more...)

arXiv.org Machine Learning

1709.07871

Country:

Europe > France (0.14)
North America (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.94)
Media > Film (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning Visual Reasoning Without Strong Priors

Perez, Ethan, de Vries, Harm, Strub, Florian, Dumoulin, Vincent, Courville, Aaron

arXiv.org Artificial IntelligenceDec-18-2017

Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an important step towards artificial general intelligence. This multi-modal task requires learning a question-dependent, structured reasoning process over images from language. Standard deep learning approaches tend to exploit biases in the data rather than learn this underlying structure, while leading methods learn to visually reason successfully but are hand-crafted for reasoning. We show that a general-purpose, Conditional Batch Normalization approach achieves state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4% error rate. We outperform the next best end-to-end method (4.5%) and even methods that use extra supervision (3.1%). We probe our model to shed light on how it reasons, showing it has learned a question-dependent, multi-step process. Previous work has operated under the assumption that visual reasoning calls for a specialized architecture, but we show that a general architecture with proper conditioning can learn to visually reason effectively.

deep learning, neural network, reasoning, (19 more...)

arXiv.org Artificial Intelligence

1707.03017

Country:

North America > Canada (0.14)
Europe > France (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback