AITopics | Joulin, Armand

Collaborating Authors

Joulin, Armand

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Target Conditioning for One-to-Many Generation

Lachaux, Marie-Anne, Joulin, Armand, Lample, Guillaume

arXiv.org Machine LearningSep-21-2020

Neural Machine Translation (NMT) models often lack diversity in their generated translations, even when paired with search algorithm, like beam search. A challenge is that the diversity in translations are caused by the variability in the target language, and cannot be inferred from the source sentence alone. In this paper, we propose to explicitly model this one-to-many mapping by conditioning the decoder of a NMT model on a latent variable that represents the domain of target sentences. The domain is a discrete variable generated by a target encoder that is jointly trained with the NMT model. The predicted domain of target sentences are given as input to the decoder during training. At inference, we can generate diverse translations by decoding with different domains. Unlike our strongest baseline (Shen et al., 2019), our method can scale to any number of domains without affecting the performance or the training time. We assess the quality and diversity of translations generated by our model with several metrics, on three different datasets.

artificial intelligence, machine translation, natural language, (16 more...)

arXiv.org Machine Learning

2009.09758

Country: Asia > Middle East > Republic of Türkiye (0.29)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning to Visually Navigate in Photorealistic Environments Without any Supervision

Mezghani, Lina, Sukhbaatar, Sainbayar, Szlam, Arthur, Joulin, Armand, Bojanowski, Piotr

arXiv.org Artificial IntelligenceApr-10-2020

Learning to navigate in a realistic setting where an agent must rely solely on visual inputs is a challenging task, in part because the lack of position information makes it difficult to provide supervision during training. In this paper, we introduce a novel approach for learning to navigate from image inputs without external supervision or reward. Our approach consists of three stages: learning a good representation of first-person views, then learning to explore using memory, and finally learning to navigate by setting its own goals. The model is trained with intrinsic rewards only so that it can be applied to any environment with image observations. We show the benefits of our approach by training an agent to navigate challenging photo-realistic environments from the Gibson dataset with RGB inputs only.

agent, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2004.04954

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Efficient Optimization for Discriminative Latent Class Models

Joulin, Armand, Ponce, Jean, Bach, Francis R.

Neural Information Processing SystemsFeb-15-2020, 01:41:33 GMT

Dimensionality reduction is commonly used in the setting of multi-label supervised classification to control the learning capacity and to provide a meaningful representation of the data. We introduce a simple forward probabilistic model which is a multinomial extension of reduced rank regression; we show that this model provides a probabilistic interpretation of discriminative clustering methods with added benefits in terms of number of hyperparameters and optimization. While expectation-maximization (EM) algorithm is commonly used to learn these models, its optimization usually leads to local minimum because it relies on a non-convex cost function with many such local minima. To avoid this problem, we introduce a local approximation of this cost function, which leads to a quadratic non-convex optimization problem over a product of simplices. In order to minimize such functions, we propose an efficient algorithm based on convex relaxation and low-rank representation of our data, which allows to deal with large instances.

artificial intelligence, discriminative latent class model, machine learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Deep Fragment Embeddings for Bidirectional Image Sentence Mapping

Karpathy, Andrej, Joulin, Armand, Fei-Fei, Li F.

Neural Information Processing SystemsFeb-14-2020, 08:56:53 GMT

We introduce a model for bidirectional retrieval of images and sentences through a deep, multi-modal embedding of visual and natural language data. Unlike previous models that directly map images or sentences into a common embedding space, our model works on a finer level and embeds fragments of images (objects) and fragments of sentences (typed dependency tree relations) into a common space. We then introduce a structured max-margin objective that allows our model to explicitly associate these fragments across modalities. Extensive experimental evaluation shows that reasoning on both the global level of images and sentences and the finer level of their respective fragments improves performance on image-sentence retrieval tasks. Additionally, our model provides interpretable predictions for the image-sentence retrieval task since the inferred inter-modal alignment of fragments is explicit.

artificial intelligence, bidirectional image sentence mapping, fragment, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.53)

Add feedback

Reducing Transformer Depth on Demand with Structured Dropout

Fan, Angela, Grave, Edouard, Joulin, Armand

arXiv.org Machine LearningSep-25-2019

Overparameterized transformer networks have obtained state of the art results in various natural language processing tasks, such as machine translation, language modeling, and question answering. These models contain hundreds of millions of parameters, necessitating a large amount of computation and making them prone to overfitting. In this work, we explore LayerDrop, a form of structured dropout, which has a regularization effect during training and allows for efficient pruning at inference time. In particular, we show that it is possible to select sub-networks of any depth from one large network without having to finetune them and with limited impact on performance. We demonstrate the effectiveness of our approach by improving the state of the art on machine translation, language modeling, summarization, question answering, and language understanding benchmarks. Moreover, we show that our approach leads to small BERT-like models of higher quality compared to training from scratch or using distillation.

arxiv preprint arxiv, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

1909.11556

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.91)

Add feedback

Why Build an Assistant in Minecraft?

Szlam, Arthur, Gray, Jonathan, Srinet, Kavya, Jernite, Yacine, Joulin, Armand, Synnaeve, Gabriel, Kiela, Douwe, Yu, Haonan, Chen, Zhuoyuan, Goyal, Siddharth, Guo, Demi, Rothermel, Danielle, Zitnick, C. Lawrence, Weston, Jason

arXiv.org Artificial IntelligenceJul-22-2019

In the last decade, we have seen a qualitative jump in the performance of machine learning (ML) methods directed at narrow, well-defined tasks. For example, there has been marked progress in object recognition [57], game-playing [73], and generative models of images [40] and text [39]. Some of these methods have achieved superhuman performance within their domain [73, 64]. In each of these cases, a powerful ML model was trained using large amounts of data on a highly complex task to surpass what was commonly believed possible. Here we consider the transpose of this situation.

artificial intelligence, arxiv preprint arxiv, computer game, (17 more...)

arXiv.org Artificial Intelligence

1907.09273

Country: Europe > Germany (0.14)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

Augmenting Self-attention with Persistent Memory

Sukhbaatar, Sainbayar, Grave, Edouard, Lample, Guillaume, Jegou, Herve, Joulin, Armand

arXiv.org Machine LearningJul-2-2019

Transformer networks have lead to important progress in language modeling and machine translation. These models include two consecutive modules, a feed-forward layer and a self-attention layer. The latter allows the network to capture long term dependencies and are often regarded as the key ingredient in the success of Transformers. Building upon this intuition, we propose a new model that solely consists of attention layers. More precisely, we augment the self-attention layers with persistent memory vectors that play a similar role as the feed-forward layer. Thanks to these vectors, we can remove the feed-forward layer without degrading the performance of a transformer. Our evaluation shows the benefits brought by our model on standard character and word level language modeling benchmarks.

artificial intelligence, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

1907.0147

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Adaptive Attention Span in Transformers

Sukhbaatar, Sainbayar, Grave, Edouard, Bojanowski, Piotr, Joulin, Armand

arXiv.org Machine LearningMay-19-2019

Part of its success is due to its ability to model called Sequential Transformer capture long term dependencies. This is achieved (Vaswani et al., 2017). A Transformer is by taking long sequences as inputs and explicitly made of a sequence of layers that are composed of compute the relations between every token via a a block of parallel self-attention layers followed mechanism called the "self-attention" layer (Al-by a feedforward network. We refer to Vaswani Rfou et al., 2019).

attention span, machine translation, neural network, (22 more...)

arXiv.org Machine Learning

1905.07799

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.32)

Add feedback

Cooperative Learning of Disjoint Syntax and Semantics

Havrylov, Serhii, Kruszewski, Germán, Joulin, Armand

arXiv.org Artificial IntelligenceFeb-25-2019

There has been considerable attention devoted to models that learn to jointly infer an expression's syntactic structure and its semantics. Yet, \citet{NangiaB18} has recently shown that the current best systems fail to learn the correct parsing strategy on mathematical expressions generated from a simple context-free grammar. In this work, we present a recursive model inspired by \newcite{ChoiYL18} that reaches near perfect accuracy on this task. Our model is composed of two separated modules for syntax and semantics. They are cooperatively trained with standard continuous and discrete optimization schemes. Our model does not require any linguistic structure for supervision and its recursive nature allows for out-of-domain generalization with little loss in performance. Additionally, our approach performs competitively on several natural language tasks, such as Natural Language Inference or Sentiment Analysis.

deep learning, neural network, parser, (24 more...)

arXiv.org Artificial Intelligence

1902.09393

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

Unsupervised Alignment of Embeddings with Wasserstein Procrustes

Grave, Edouard, Joulin, Armand, Berthet, Quentin

arXiv.org Machine LearningMay-28-2018

We consider the task of aligning two sets of points in high dimension, which has many applications in natural language processing and computer vision. As an example, it was recently shown that it is possible to infer a bilingual lexicon, without supervised data, by aligning word embeddings trained on monolingual data. These recent advances are based on adversarial training to learn the mapping between the two embeddings. In this paper, we propose to use an alternative formulation, based on the joint estimation of an orthogonal matrix and a permutation matrix. While this problem is not convex, we propose to initialize our optimization algorithm by using a convex relaxation, traditionally considered for the graph isomorphism problem. We propose a stochastic algorithm to minimize our cost function on large scale problems. Finally, we evaluate our method on the problem of unsupervised word translation, by aligning word embeddings trained on monolingual data. On this task, our method obtains state of the art results, while requiring less computational resources than competing approaches.

optimization problem, survey article, wasserstein distance, (20 more...)

arXiv.org Machine Learning

1805.11222

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback