AITopics | Sagot, Benoit

Collaborating Authors

Sagot, Benoit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BigO(Bench) -- Can LLMs Generate Code with Controlled Time and Space Complexity?

Chambon, Pierre, Roziere, Baptiste, Sagot, Benoit, Synnaeve, Gabriel

arXiv.org Artificial IntelligenceMar-20-2025

We introduce BigO(Bench), a novel coding benchmark designed to evaluate the capabilities of generative language models in understanding and generating code with specified time and space complexities. This benchmark addresses the gap in current evaluations that often overlook the ability of models to comprehend and produce code constrained by computational complexity. BigO(Bench) includes tooling to infer the algorithmic complexity of any Python function from profiling measurements, including human- or LLM-generated solutions. BigO(Bench) also includes of set of 3,105 coding problems and 1,190,250 solutions from Code Contests annotated with inferred (synthetic) time and space complexity labels from the complexity framework, as well as corresponding runtime and memory footprint values for a large set of input sizes. We present results from evaluating multiple state-of-the-art language models on this benchmark, highlighting their strengths and weaknesses in handling complexity requirements. In particular, token-space reasoning models are unrivaled in code generation but not in complexity understanding, hinting that they may not generalize well to tasks for which no reward was given at training time.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.15242

Country: Asia > Middle East (0.14)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

SpiRit-LM: Interleaved Spoken and Written Language Model

Nguyen, Tu Anh, Muller, Benjamin, Yu, Bokai, Costa-jussa, Marta R., Elbayad, Maha, Popuri, Sravya, Duquenne, Paul-Ambroise, Algayres, Robin, Mavlyutov, Ruslan, Gat, Itai, Synnaeve, Gabriel, Pino, Juan, Sagot, Benoit, Dupoux, Emmanuel

arXiv.org Artificial IntelligenceFeb-8-2024

We introduce SPIRIT-LM, a foundation multimodal language model that freely mixes text and speech. Our model is based on a pretrained text language model that we extend to the speech modality by continuously training it on text and speech units. Speech and text sequences are concatenated as a single set of tokens, and trained with a word-level interleaving method using a small automatically-curated speech-text parallel corpus. SPIRIT-LM comes in two versions: a BASE version that uses speech semantic units and an EXPRESSIVE version that models expressivity using pitch and style units in addition to the semantic units. For both versions, the text is encoded with subword BPE tokens. The resulting model displays both the semantic abilities of text models and the expressive abilities of speech models. Additionally, we demonstrate that SPIRIT-LM is able to learn new tasks in a few-shot fashion across modalities (i.e. ASR, TTS, Speech Classification).

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2402.05755

Country:

Europe (0.92)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning

Algayres, Robin, Nabli, Adel, Sagot, Benoit, Dupoux, Emmanuel

arXiv.org Artificial IntelligenceOct-21-2023

Building on similar ideas in vision and speech, we select our positive examples through a We introduce a simple neural encoder architecture that can mix of time-stretching data augmentation [26] and k-Nearerst be trained using an unsupervised contrastive learning objective Neighbors search [27, 28]. Figure 1 gives an overview of our which gets its positive samples from data-augmented k-Nearest method. To evaluate our method, we test our model on 5 types Neighbors search. We show that when built on top of recent of acoustic features: MFCCs, CPC [4, 3] HuBERT [1] and self-supervised audio representations [1, 2, 3], this method can Wav2Vec 2.0 (Base and Large) [2]. We pick the best method be applied iteratively and yield competitive SSE as evaluated on from our LibriSpeech benchmark and show that when applied two tasks: query-by-example of random sequences of speech, without any change to the task of spoken term discovery as defined and spoken term discovery. On both tasks our method pushes in the zero resource challenges [29], we beat the state of the state-of-the-art by a significant margin across 5 different the art on the NED/COV metric by a large margin in 5 new languages. Finally, we establish a benchmark on a query-byexample datasets.

available, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2204.05148

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Generative Spoken Language Model based on continuous word-sized audio tokens

Algayres, Robin, Adi, Yossi, Nguyen, Tu Anh, Copet, Jade, Synnaeve, Gabriel, Sagot, Benoit, Dupoux, Emmanuel

arXiv.org Artificial IntelligenceOct-8-2023

In NLP, text language models based on words or subwords are known to outperform their character-based counterparts. Yet, in the speech community, the standard input of spoken LMs are 20ms or 40ms-long discrete units (shorter than a phoneme). Taking inspiration from word-based LM, we introduce a Generative Spoken Language Model (GSLM) based on word-size continuous-valued audio embeddings that can generate diverse and expressive language output. This is obtained by replacing lookup table for lexical types with a Lexical Embedding function, the cross entropy loss by a contrastive loss, and multinomial sampling by k-NN sampling. The resulting model is the first generative language model based on word-size continuous embeddings. Its performance is on par with discrete unit GSLMs regarding generation quality as measured by automatic metrics and subjective human judgements. Moreover, it is five times more memory efficient thanks to its large 200ms units. In addition, the embeddings before and after the Lexical Embedder are phonetically and semantically interpretable.

artificial intelligence, generative spoken language model

arXiv.org Artificial Intelligence

2310.05224

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech (0.60)

Add feedback

XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words

Algayres, Robin, Diego-Simon, Pablo, Sagot, Benoit, Dupoux, Emmanuel

arXiv.org Artificial IntelligenceOct-8-2023

Due to the absence of explicit word boundaries in the speech stream, the task of segmenting spoken sentences into word units without text supervision is particularly challenging. In this work, we leverage the most recent self-supervised speech models that have proved to quickly adapt to new tasks through fine-tuning, even in low resource conditions. Taking inspiration from semi-supervised learning, we fine-tune an XLS-R model to predict word boundaries themselves produced by top-tier speech segmentation systems: DPDP, VG-HuBERT, GradSeg and DP-Parse. Once XLS-R is fine-tuned, it is used to infer new word boundary labels that are used in turn for another fine-tuning step. Our method consistently improves the performance of each system and sets a new state-of-the-art that is, on average 130% higher than the previous one as measured by the F1 score on correctly discovered word tokens on five corpora featuring different languages. Finally, our system can segment speech from languages unseen during fine-tuning in a zero-shot fashion.

machine learning, natural language, unsupervised speech segmentation, (3 more...)

arXiv.org Artificial Intelligence

2310.05235

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.87)
Information Technology > Artificial Intelligence > Machine Learning (0.87)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.60)

Add feedback

Generative Spoken Dialogue Language Modeling

Nguyen, Tu Anh, Kharitonov, Eugene, Copet, Jade, Adi, Yossi, Hsu, Wei-Ning, Elkahky, Ali, Tomasello, Paden, Algayres, Robin, Sagot, Benoit, Mohamed, Abdelrahman, Dupoux, Emmanuel

arXiv.org Artificial IntelligenceNov-22-2022

We introduce dGSLM, the first "textless" model able to generate audio samples of naturalistic spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with a dual-tower transformer architecture with cross-attention trained on 2000 hours of two-channel raw conversational audio (Fisher dataset) without any text or labels. We show that our model is able to generate speech, laughter and other paralinguistic signals in the two channels simultaneously and reproduces more naturalistic and fluid turn-taking compared to a text-based cascaded model.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2203.16502

Country:

Europe (0.67)
North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.84)

Add feedback