AITopics | Jégou, Hervé

Collaborating Authors

Jégou, Hervé

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Inference-time sparse attention with asymmetric indexing

Mazaré, Pierre-Emmanuel, Szilvasy, Gergely, Lomeli, Maria, Massa, Francisco, Murray, Naila, Jégou, Hervé, Douze, Matthijs

arXiv.org Artificial IntelligenceFeb-12-2025

Self-attention in transformer models is an incremental associative memory that maps key vectors to value vectors. One way to speed up self-attention is to employ GPU-compliant vector search algorithms, yet the standard partitioning methods yield poor results in this context, because (1) keys and queries follow different distributions and (2) the effect of RoPE positional encoding. In this paper, we introduce SAAP (Self-Attention with Asymmetric Partitions), which overcomes these problems. It is an asymmetrical indexing technique that employs distinct partitions for keys and queries, thereby approximating self-attention with a data-adaptive sparsity pattern. It works on pretrained language models without finetuning, as it only requires to train (offline) a small query classifier. On a long context Llama 3.1-8b model, with sequences ranging from 100k to 500k tokens, our method typically reduces by a factor 20 the fraction of memory that needs to be looked-up, which translates to a time saving of 60\% when compared to FlashAttention-v2.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.08246

Country: Asia > Thailand (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Moshi: a speech-text foundation model for real-time dialogue

Défossez, Alexandre, Mazaré, Laurent, Orsini, Manu, Royer, Amélie, Pérez, Patrick, Jégou, Hervé, Grave, Edouard, Zeghidour, Neil

arXiv.org Artificial IntelligenceOct-2-2024

We introduce Moshi, a speech-text foundation model and full-duplex spoken dialogue framework. Current systems for spoken dialogue rely on pipelines of independent components, namely voice activity detection, speech recognition, textual dialogue and text-to-speech. Such frameworks cannot emulate the experience of real conversations. First, their complexity induces a latency of several seconds between interactions. Second, text being the intermediate modality for dialogue, non-linguistic information that modifies meaning -- such as emotion or non-speech sounds -- is lost in the interaction. Finally, they rely on a segmentation into speaker turns, which does not take into account overlapping speech, interruptions and interjections. Moshi solves these independent issues altogether by casting spoken dialogue as speech-to-speech generation. Starting from a text language model backbone, Moshi generates speech as tokens from the residual quantizer of a neural audio codec, while modeling separately its own speech and that of the user into parallel streams. This allows for the removal of explicit speaker turns, and the modeling of arbitrary conversational dynamics. We moreover extend the hierarchical semantic-to-acoustic token generation of previous work to first predict time-aligned text tokens as a prefix to audio tokens. Not only this "Inner Monologue" method significantly improves the linguistic quality of generated speech, but we also illustrate how it can provide streaming speech recognition and text-to-speech. Our resulting model is the first real-time full-duplex spoken large language model, with a theoretical latency of 160ms, 200ms in practice, and is available at https://github.com/kyutai-labs/moshi.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.00037

Country:

Asia (0.67)
North America > United States > Hawaii (0.14)
North America > United States > California (0.14)

Genre:

Research Report (0.81)
Workflow (0.67)

Industry: Leisure & Entertainment > Sports > Football (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Vo, Huy V., Khalidov, Vasil, Darcet, Timothée, Moutakanni, Théo, Smetanin, Nikita, Szafraniec, Marc, Touvron, Hugo, Couprie, Camille, Oquab, Maxime, Joulin, Armand, Jégou, Hervé, Labatut, Patrick, Bojanowski, Piotr

arXiv.org Artificial IntelligenceJun-28-2024

Self-supervised features are the cornerstone of modern machine learning systems. They are typically pre-trained on data collections whose construction and curation typically require extensive human effort. This manual process has some limitations similar to those encountered in supervised learning, e.g., the crowd-sourced selection of data is costly and time-consuming, preventing scaling the dataset size. In this work, we consider the problem of automatic curation of high-quality datasets for self-supervised pre-training. We posit that such datasets should be large, diverse and balanced, and propose a clustering-based approach for building ones satisfying all these criteria. Our method involves successive and hierarchical applications of $k$-means on a large and diverse data repository to obtain clusters that distribute uniformly among data concepts, followed by a hierarchical, balanced sampling step from these clusters. Extensive experiments on three different data domains including web-based images, satellite images and text show that features trained on our automatically curated datasets outperform those trained on uncurated data while being on par or better than ones trained on manually curated data. Code is available at https://github.com/facebookresearch/ssl-data-curation.

artificial intelligence, inductive learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2405.15613

Country:

North America > United States (1.00)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

The Faiss library

Douze, Matthijs, Guzhva, Alexandr, Deng, Chengqi, Johnson, Jeff, Szilvasy, Gergely, Mazaré, Pierre-Emmanuel, Lomeli, Maria, Hosseini, Lucas, Jégou, Hervé

arXiv.org Artificial IntelligenceJan-16-2024

Vector databases manage large collections of embedding vectors. As AI applications are growing rapidly, so are the number of embeddings that need to be stored and indexed. The Faiss library is dedicated to vector similarity search, a core functionality of vector databases. Faiss is a toolkit of indexing methods and related primitives used to search, cluster, compress and transform vectors. This paper first describes the tradeoff space of vector search, then the design principles of Faiss in terms of structure, approach to optimization and interfacing. We benchmark key features of the library and discuss a few selected applications to highlight its broad applicability.

data mining, machine learning, programming language, (22 more...)

arXiv.org Artificial Intelligence

2401.08281

Country: Europe > France (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(5 more...)

Add feedback

Improving Statistical Fidelity for Neural Image Compression with Implicit Local Likelihood Models

Muckley, Matthew J., El-Nouby, Alaaeldin, Ullrich, Karen, Jégou, Hervé, Verbeek, Jakob

arXiv.org Artificial IntelligenceAug-10-2023

Lossy image compression aims to represent images in as few bits as possible while maintaining fidelity to the original. Theoretical results indicate that optimizing distortion metrics such as PSNR or MS-SSIM necessarily leads to a discrepancy in the statistics of original images from those of reconstructions, in particular at low bitrates, often manifested by the blurring of the compressed images. Previous work has leveraged adversarial discriminators to improve statistical fidelity. Yet these binary discriminators adopted from generative modeling tasks may not be ideal for image compression. In this paper, we introduce a non-binary discriminator that is conditioned on quantized local image representations obtained via VQ-VAE autoencoders. Our evaluations on the CLIC2020, DIV2K and Kodak datasets show that our discriminator is more effective for jointly optimizing distortion (e.g., PSNR) and statistical fidelity (e.g., FID) than the PatchGAN of the state-of-the-art HiFiC model. On CLIC2020, we obtain the same FID as HiFiC with 30-40\% fewer bits.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2301.11189

Country: Europe (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The Stable Signature: Rooting Watermarks in Latent Diffusion Models

Fernandez, Pierre, Couairon, Guillaume, Jégou, Hervé, Douze, Matthijs, Furon, Teddy

arXiv.org Artificial IntelligenceJul-26-2023

Generative image modeling enables a wide range of applications but raises ethical concerns about responsible deployment. This paper introduces an active strategy combining image watermarking and Latent Diffusion Models. The goal is for all generated images to conceal an invisible watermark allowing for future detection and/or identification. The method quickly fine-tunes the latent decoder of the image generator, conditioned on a binary signature. A pre-trained watermark extractor recovers the hidden signature from any generated image and a statistical test then determines whether it comes from the generative model. We evaluate the invisibility and robustness of the watermarks on a variety of generation tasks, showing that Stable Signature works even after the images are modified. For instance, it detects the origin of an image generated from a text prompt, then cropped to keep $10\%$ of the content, with $90$+$\%$ accuracy at a false positive rate below 10$^{-6}$.

artificial intelligence, machine learning, watermark, (16 more...)

arXiv.org Artificial Intelligence

2303.15435

Country: Europe > Spain (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Gradient-based Adversarial Attacks against Text Transformers

Guo, Chuan, Sablayrolles, Alexandre, Jégou, Hervé, Kiela, Douwe

arXiv.org Artificial IntelligenceApr-15-2021

We propose the first general-purpose gradient-based attack against transformer models. Instead of searching for a single adversarial example, we search for a distribution of adversarial examples parameterized by a continuous-valued matrix, hence enabling gradient-based optimization. We empirically demonstrate that our white-box attack attains state-of-the-art attack performance on a variety of natural language tasks. Furthermore, we show that a powerful black-box transfer attack, enabled by sampling from the adversarial distribution, matches or exceeds existing methods, while only requiring hard-label outputs.

adversarial example, constraint-based reasoning, neural network, (21 more...)

arXiv.org Artificial Intelligence

2104.13733

Country:

Europe (0.68)
Asia (0.47)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (0.83)
Government (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)

Add feedback

White-box vs Black-box: Bayes Optimal Strategies for Membership Inference

Sablayrolles, Alexandre, Douze, Matthijs, Ollivier, Yann, Schmid, Cordelia, Jégou, Hervé

arXiv.org Machine LearningAug-29-2019

Membership inference determines, given a sample and trained parameters of a machine learning model, whether the sample was part of the training set. In this paper, we derive the optimal strategy for membership inference with a few assumptions on the distribution of the parameters. We show that optimal attacks only depend on the loss function, and thus black-box attacks are as good as white-box attacks. As the optimal strategy is not tractable, we provide approximations of it leading to several inference methods, and show that existing membership inference methods are coarser approximations of this optimal strategy. Our membership attacks outperform the state of the art in various settings, ranging from a simple logistic regression to more complex architectures and datasets, such as ResNet-101 and Imagenet.

air transportation, membership inference, neural network, (19 more...)

arXiv.org Machine Learning

1908.11229

Country:

Europe (0.14)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.93)
Transportation > Air (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A neural network catalyzer for multi-dimensional similarity search

Sablayrolles, Alexandre, Douze, Matthijs, Schmid, Cordelia, Jégou, Hervé

arXiv.org Machine LearningJun-8-2018

This paper aims at learning a function mapping input vectors to an output space in a way that improves high-dimensional similarity search. As a proxy objective, we design and train a neural network that favors uniformity in the spherical output space, while preserving the neighborhood structure after the mapping. For this purpose, we propose a new regularizer derived from the Kozachenko-Leonenko differential entropy estimator and combine it with a locality-aware triplet loss. Our method operates as a catalyzer for traditional indexing methods such as locality sensitive hashing or iterative quantization, boosting the overall recall. Additionally, the network output distribution makes it possible to leverage structured quantizers with efficient algebraic encoding, in particular spherical lattice quantizers such as the Gosset lattice E8. Our experiments show that this approach is competitive with state-of-the-art methods such as optimized product quantization.

artificial intelligence, neural network, vector, (17 more...)

arXiv.org Machine Learning

1806.03198

Country: Europe > France (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Low-shot learning with large-scale diffusion

Douze, Matthijs, Szlam, Arthur, Hariharan, Bharath, Jégou, Hervé

arXiv.org Machine LearningOct-17-2017

This paper considers the problem of inferring image labels for which only a few labelled examples are available at training time. This setup is often referred to as low-shot learning in the literature, where a standard approach is to re-train the last few layers of a convolutional neural network learned on separate classes. We consider a semi-supervised setting in which we exploit a large collection of images to support label propagation. This is made possible by leveraging the recent advances on large-scale similarity graph construction. We show that despite its conceptual simplicity, scaling up label propagation to up hundred millions of images leads to state of the art accuracy in the low-shot learning regime.

deep learning, diffusion, neural network, (21 more...)

arXiv.org Machine Learning

1706.02332

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback