AITopics | Ropers, Christophe

Collaborating Authors

Ropers, Christophe

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation

The Omnilingual MT Team, null, Andrews, Pierre, Artetxe, Mikel, Meglioli, Mariano Coria, Costa-jussà, Marta R., Chuang, Joe, Dale, David, Gao, Cynthia, Maillard, Jean, Mourachko, Alex, Ropers, Christophe, Saleem, Safiyyah, Sánchez, Eduardo, Tsiamas, Ioannis, Turkatenko, Arina, Ventayol-Boada, Albert, Yates, Shireen

arXiv.org Artificial IntelligenceFeb-6-2025

This paper presents BOUQuET, a multicentric and multi-register/domain dataset and benchmark, and its broader collaborative extension initiative. This dataset is handcrafted in non-English languages first, each of these source languages being represented among the 23 languages commonly used by half of the world's population and therefore having the potential to serve as pivot languages that will enable more accurate translations. The dataset is specially designed to avoid contamination and be multicentric, so as to enforce representation of multilingual language features. In addition, the dataset goes beyond the sentence level, as it is organized in paragraphs of various lengths. Compared with related machine translation (MT) datasets, we show that BOUQuET has a broader representation of domains while simplifying the translation task for non-experts. Therefore, BOUQuET is specially suitable for the open initiative and call for translation participation that we are launching to extend it to a multi-way parallel corpus to any written language.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.04314

Country:

North America > United States (0.28)
Asia > Middle East > UAE (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

2M-BELEBELE: Highly Multilingual Speech and American Sign Language Comprehension Dataset

Costa-jussà, Marta R., Yu, Bokai, Andrews, Pierre, Alastruey, Belen, Camgoz, Necati Cihan, Chuang, Joe, Maillard, Jean, Ropers, Christophe, Turkantenko, Arina, Wood, Carleigh

arXiv.org Artificial IntelligenceDec-23-2024

We introduce the first highly multilingual speech and American Sign Language (ASL) comprehension dataset by extending BELEBELE. Our dataset covers 74 spoken languages at the intersection of BELEBELE and FLEURS, and one sign language (ASL). We evaluate 2M-BELEBELE dataset for both 5-shot and zero-shot settings and across languages, the speech comprehension accuracy is ~ 2-3% average lower compared to reading comprehension.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2412.08274

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Education > Curriculum > Subject-Specific Education (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.46)

Add feedback

Large Concept Models: Language Modeling in a Sentence Representation Space

LCM team, null, Barrault, Loïc, Duquenne, Paul-Ambroise, Elbayad, Maha, Kozhevnikov, Artyom, Alastruey, Belen, Andrews, Pierre, Coria, Mariano, Couairon, Guillaume, Costa-jussà, Marta R., Dale, David, Elsahar, Hady, Heffernan, Kevin, Janeiro, João Maria, Tran, Tuan, Ropers, Christophe, Sánchez, Eduardo, Roman, Robin San, Mourachko, Alexandre, Saleem, Safiyyah, Schwenk, Holger

arXiv.org Artificial IntelligenceDec-15-2024

LLMs have revolutionized the field of artificial intelligence and have emerged as the de-facto tool for many tasks. The current established technology of LLMs is to process input and generate output at the token level. This is in sharp contrast to humans who operate at multiple levels of abstraction, well beyond single words, to analyze information and to generate creative content. In this paper, we present an attempt at an architecture which operates on an explicit higher-level semantic representation, which we name a concept. Concepts are language- and modality-agnostic and represent a higher level idea or action in a flow. Hence, we build a "Large Concept Model". In this study, as proof of feasibility, we assume that a concept corresponds to a sentence, and use an existing sentence embedding space, SONAR, which supports up to 200 languages in both text and speech modalities. The Large Concept Model is trained to perform autoregressive sentence prediction in an embedding space. We explore multiple approaches, namely MSE regression, variants of diffusion-based generation, and models operating in a quantized SONAR space. These explorations are performed using 1.6B parameter models and training data in the order of 1.3T tokens. We then scale one architecture to a model size of 7B parameters and training data of about 2.7T tokens. We perform an experimental evaluation on several generative tasks, namely summarization and a new task of summary expansion. Finally, we show that our model exhibits impressive zero-shot generalization performance to many languages, outperforming existing LLMs of the same size. The training code of our models is freely available.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.08821

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LCFO: Long Context and Long Form Output Dataset and Benchmarking

Costa-jussà, Marta R., Andrews, Pierre, Meglioli, Mariano Coria, Chen, Joy, Chuang, Joe, Dale, David, Ropers, Christophe, Mourachko, Alexandre, Sánchez, Eduardo, Schwenk, Holger, Tran, Tuan, Turkatenko, Arina, Wood, Carleigh

arXiv.org Artificial IntelligenceDec-12-2024

This paper presents the Long Context and Form Output (LCFO) benchmark, a novel evaluation framework for assessing gradual summarization and summary expansion capabilities across diverse domains. LCFO consists of long input documents (5k words average length), each of which comes with three summaries of different lengths (20%, 10%, and 5% of the input text), as well as approximately 15 questions and answers (QA) related to the input content. Notably, LCFO also provides alignments between specific QA pairs and corresponding summaries in 7 domains. The primary motivation behind providing summaries of different lengths is to establish a controllable framework for generating long texts from shorter inputs, i.e. summary expansion. To establish an evaluation metric framework for summarization and summary expansion, we provide human evaluation scores for human-generated outputs, as well as results from various state-of-the-art large language models (LLMs). GPT-4o-mini achieves best human scores among automatic systems in both summarization and summary expansion tasks (~ +10% and +20%, respectively). It even surpasses human output quality in the case of short summaries (~ +7%). Overall automatic metrics achieve low correlations with human evaluation scores (~ 0.4) but moderate correlation on specific evaluation aspects such as fluency and attribution (~ 0.6). The LCFO benchmark offers a standardized platform for evaluating summarization and summary expansion performance, as well as corresponding automatic metrics, thereby providing an important evaluation framework to advance generative AI.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.08268

Country:

North America > United States (1.00)
Europe (1.00)
Asia (0.93)

Genre:

Overview (0.93)
Research Report > New Finding (0.45)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Y-NQ: English-Yor\`ub\'a Evaluation dataset for Open-Book Reading Comprehension and Text Generation

Costa-jussà, Marta R., Chen, Joy, Adebara, Ifeoluwanimi, Chuang, Joe, Ropers, Christophe, Sánchez, Eduardo

arXiv.org Artificial IntelligenceDec-11-2024

This study explores the intersection of reading comprehension and text generation, examining how models perform on tasks requiring both in-context understanding (i.e., open-book model, where the model has access to the context document during inference to answer a particular question) and generative text production (i.e. the answer is free-text which has to be compared to a gold standard reference). We aim to investigate the performance of this task in two languages: a high-resource language (English) and a low-resource language (Yorùbá). For this, we introduce Y-NQ (Yorùbá Natural Questions) a comprehensive open-book questionanswer dataset (Section 2). Y-NQ is sourced from NQ (Kwiatkowski et al., 2019) and provides a complete article context for informed answers and text generation tasks, and parallel documents on the same topic for both high-and low-resource languages. The data set also includes the comparability of the responses in languages. As a result, we are increasing Natural Language Processing (NLP) resources in Yorùbá (Ahia et al., 2024). Our data set is benchmarked against state-of-the-art Large Language Models (LLMs). The results and analysis (Section 3) shows that responses in Yorùbá are more inaccurate than those in English. As a by-product of human annotations, we identify inaccuracies in the English-language version of some Wikipedia articles (26 incorrect answers out of 1,566 humanly analyzed questions in the English-language subset of articles), which confirms the existence of accuracy discrepancies across languages for the same Wikipedia topics, thus supporting, for example, the need to better interlink Wikipedia articles across languages (Klang and Nugues, 2016).

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.08279

Country:

Asia (0.94)
North America > United States (0.29)

Genre: Research Report (0.51)

Industry: Education > Assessment & Standards > Student Performance (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

On the Role of Speech Data in Reducing Toxicity Detection Bias

Bell, Samuel J., Meglioli, Mariano Coria, Richards, Megan, Sánchez, Eduardo, Ropers, Christophe, Wang, Skyler, Williams, Adina, Sagun, Levent, Costa-jussà, Marta R.

arXiv.org Artificial IntelligenceNov-12-2024

Text toxicity detection systems exhibit significant biases, producing disproportionate rates of false positives on samples mentioning demographic groups. But what about toxicity detection in speech? To investigate the extent to which text-based biases are mitigated by speech-based systems, we produce a set of high-quality group annotations for the multilingual MuTox dataset, and then leverage these annotations to systematically compare speech- and text-based toxicity classifiers. Our findings indicate that access to speech data during inference supports reduced bias against group mentions, particularly for ambiguous and disagreement-inducing samples. Our results also suggest that improving classifiers, rather than transcription pipelines, is more helpful for reducing group bias. We publicly release our annotations and provide recommendations for future toxicity dataset construction.

annotator, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2411.08135

Country:

North America (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Add feedback

Towards Red Teaming in Multimodal and Multilingual Translation

Ropers, Christophe, Dale, David, Hansanti, Prangthip, Gonzalez, Gabriel Mejia, Evtimov, Ivan, Wong, Corinne, Touret, Christophe, Pereyra, Kristina, Kim, Seohyun Sonia, Ferrer, Cristian Canton, Andrews, Pierre, Costa-jussà, Marta R.

arXiv.org Artificial IntelligenceJan-29-2024

Assessing performance in Natural Language Processing is becoming increasingly complex. One particular challenge is the potential for evaluation datasets to overlap with training data, either directly or indirectly, which can lead to skewed results and overestimation of model performance. As a consequence, human evaluation is gaining increasing interest as a means to assess the performance and reliability of models. One such method is the red teaming approach, which aims to generate edge cases where a model will produce critical errors. While this methodology is becoming standard practice for generative AI, its application to the realm of conditional AI remains largely unexplored. This paper presents the first study on human-based red teaming for Machine Translation (MT), marking a significant step towards understanding and improving the performance of translation models. We delve into both human-based red teaming and a study on automation, reporting lessons learned and providing recommendations for both translation models and red teaming drills. This pioneering work opens up new avenues for research and development in the field of MT.

machine learning, natural language, translation, (19 more...)

arXiv.org Artificial Intelligence

2401.16247

Country: North America > Canada (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

MuTox: Universal MUltilingual Audio-based TOXicity Dataset and Zero-shot Detector

Costa-jussà, Marta R., Meglioli, Mariano Coria, Andrews, Pierre, Dale, David, Hansanti, Prangthip, Kalbassi, Elahe, Mourachko, Alex, Ropers, Christophe, Wood, Carleigh

arXiv.org Artificial IntelligenceJan-10-2024

Research in toxicity detection in natural language processing for the speech modality (audio-based) is quite limited, particularly for languages other than English. To address these limitations and lay the groundwork for truly multilingual audio-based toxicity detection, we introduce MuTox, the first highly multilingual audio-based dataset with toxicity labels. The dataset comprises 20,000 audio utterances for English and Spanish, and 4,000 for the other 19 languages. To demonstrate the quality of this dataset, we trained the MuTox audio-based toxicity classifier, which enables zero-shot toxicity detection across a wide range of languages. This classifier outperforms existing text-based trainable classifiers by more than 1% AUC, while expanding the language coverage more than tenfold. When compared to a wordlist-based classifier that covers a similar number of languages, MuTox improves precision and recall by approximately 2.5 times. This significant improvement underscores the potential of MuTox in advancing the field of audio-based toxicity detection.

classifier, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2401.0506

Country: Europe > France (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Seamless: Multilingual Expressive and Streaming Speech Translation

Communication, Seamless, Barrault, Loïc, Chung, Yu-An, Meglioli, Mariano Coria, Dale, David, Dong, Ning, Duppenthaler, Mark, Duquenne, Paul-Ambroise, Ellis, Brian, Elsahar, Hady, Haaheim, Justin, Hoffman, John, Hwang, Min-Jae, Inaguma, Hirofumi, Klaiber, Christopher, Kulikov, Ilia, Li, Pengwei, Licht, Daniel, Maillard, Jean, Mavlyutov, Ruslan, Rakotoarison, Alice, Sadagopan, Kaushik Ram, Ramakrishnan, Abinesh, Tran, Tuan, Wenzek, Guillaume, Yang, Yilin, Ye, Ethan, Evtimov, Ivan, Fernandez, Pierre, Gao, Cynthia, Hansanti, Prangthip, Kalbassi, Elahe, Kallet, Amanda, Kozhevnikov, Artyom, Gonzalez, Gabriel Mejia, Roman, Robin San, Touret, Christophe, Wong, Corinne, Wood, Carleigh, Yu, Bokai, Andrews, Pierre, Balioglu, Can, Chen, Peng-Jen, Costa-jussà, Marta R., Elbayad, Maha, Gong, Hongyu, Guzmán, Francisco, Heffernan, Kevin, Jain, Somya, Kao, Justine, Lee, Ann, Ma, Xutai, Mourachko, Alex, Peloquin, Benjamin, Pino, Juan, Popuri, Sravya, Ropers, Christophe, Saleem, Safiyyah, Schwenk, Holger, Sun, Anna, Tomasello, Paden, Wang, Changhan, Wang, Jeff, Wang, Skyler, Williamson, Mary

arXiv.org Artificial IntelligenceDec-8-2023

Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4T model-SeamlessM4T v2. This newer model, incorporating an updated UnitY2 framework, was trained on more low-resource language data. SeamlessM4T v2 provides the foundation on which our next two models are initiated. SeamlessExpressive enables translation that preserves vocal styles and prosody. Compared to previous efforts in expressive speech research, our work addresses certain underexplored aspects of prosody, such as speech rate and pauses, while also preserving the style of one's voice. As for SeamlessStreaming, our model leverages the Efficient Monotonic Multihead Attention mechanism to generate low-latency target translations without waiting for complete source utterances. As the first of its kind, SeamlessStreaming enables simultaneous speech-to-speech/text translation for multiple source and target languages. To ensure that our models can be used safely and responsibly, we implemented the first known red-teaming effort for multimodal machine translation, a system for the detection and mitigation of added toxicity, a systematic evaluation of gender bias, and an inaudible localized watermarking mechanism designed to dampen the impact of deepfakes. Consequently, we bring major components from SeamlessExpressive and SeamlessStreaming together to form Seamless, the first publicly available system that unlocks expressive cross-lingual communication in real-time. The contributions to this work are publicly released and accessible at https://github.com/facebookresearch/seamless_communication

artificial intelligence, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2312.05187

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.13)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.92)

Add feedback

HalOmi: A Manually Annotated Benchmark for Multilingual Hallucination and Omission Detection in Machine Translation

Dale, David, Voita, Elena, Lam, Janice, Hansanti, Prangthip, Ropers, Christophe, Kalbassi, Elahe, Gao, Cynthia, Barrault, Loïc, Costa-jussà, Marta R.

arXiv.org Artificial IntelligenceDec-5-2023

Hallucinations in machine translation are translations that contain information completely unrelated to the input. Omissions are translations that do not include some of the input information. While both cases tend to be catastrophic errors undermining user trust, annotated data with these types of pathologies is extremely scarce and is limited to a few high-resource languages. In this work, we release an annotated dataset for the hallucination and omission phenomena covering 18 translation directions with varying resource levels and scripts. Our annotation covers different levels of partial and full hallucinations as well as omissions both at the sentence and at the word level. Additionally, we revisit previous methods for hallucination and omission detection, show that conclusions made based on a single language pair largely do not hold for a large-scale evaluation, and establish new solid baselines.

artificial intelligence, natural language, translation, (16 more...)

arXiv.org Artificial Intelligence

2305.11746

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback