AITopics | Ravaut, Mathieu

Collaborating Authors

Ravaut, Mathieu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs

Chen, Hailin, Jiao, Fangkai, Ravaut, Mathieu, Farruque, Nawshad, Nguyen, Xuan Phi, Qin, Chengwei, Dey, Manan, Ding, Bosheng, Xiong, Caiming, Joty, Shafiq, Zhou, Yingbo

arXiv.org Artificial IntelligenceDec-23-2024

The rapid development of large language models (LLMs) necessitates robust, unbiased, and scalable methods for evaluating their capabilities. However, human annotations are expensive to scale, model-based evaluations are prone to biases in answer style, while target-answer-based benchmarks are vulnerable to data contamination and cheating. To address these limitations, we propose StructTest, a novel benchmark that evaluates LLMs on their ability to produce compositionally specified structured outputs as an unbiased, cheap-to-run and difficult-to-cheat measure. The evaluation is done deterministically by a rule-based evaluator, which can be easily extended to new tasks. By testing structured outputs across diverse task domains -- including Summarization, Code, HTML and Math -- we demonstrate that StructTest serves as a good proxy for general reasoning abilities, as producing structured outputs often requires internal logical reasoning. We believe that StructTest offers a critical, complementary approach to objective and robust model evaluation.

benchmark, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.18011

Country:

North America > United States (1.00)
Asia (0.68)
North America > Canada > Alberta (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How Much are LLMs Contaminated? A Comprehensive Survey and the LLMSanitize Library

Ravaut, Mathieu, Ding, Bosheng, Jiao, Fangkai, Chen, Hailin, Li, Xingxuan, Zhao, Ruochen, Qin, Chengwei, Xiong, Caiming, Joty, Shafiq

arXiv.org Artificial IntelligenceMar-31-2024

With the rise of Large Language Models (LLMs) in recent years, new opportunities are emerging, but also new challenges, and contamination is quickly becoming critical. Business applications and fundraising in AI have reached a scale at which a few percentage points gained on popular question-answering benchmarks could translate into dozens of millions of dollars, placing high pressure on model integrity. At the same time, it is becoming harder and harder to keep track of the data that LLMs have seen; if not impossible with closed-source models like GPT-4 and Claude-3 not divulging any information on the training set. As a result, contamination becomes a critical issue: LLMs' performance may not be reliable anymore, as the high performance may be at least partly due to their previous exposure to the data. This limitation jeopardizes the entire progress in the field of NLP, yet, there remains a lack of methods on how to efficiently address contamination, or a clear consensus on prevention, mitigation and classification of contamination.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2404.00699

Country:

North America > United States > Michigan (0.14)
North America > Canada (0.14)

Genre:

Overview (1.00)
Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Parameter-Efficient Conversational Recommender System as a Language Processing Task

Ravaut, Mathieu, Zhang, Hao, Xu, Lu, Sun, Aixin, Liu, Yong

arXiv.org Artificial IntelligenceFeb-3-2024

Conversational recommender systems (CRS) aim to recommend relevant items to users by eliciting user preference through natural language conversation. Prior work often utilizes external knowledge graphs for items' semantic information, a language model for dialogue generation, and a recommendation module for ranking relevant items. This combination of multiple components suffers from a cumbersome training process, and leads to semantic misalignment issues between dialogue generation and item recommendation. In this paper, we represent items in natural language and formulate CRS as a natural language processing task. Accordingly, we leverage the power of pre-trained language models to encode items, understand user intent via conversation, perform item recommendation through semantic matching, and generate dialogues. As a unified model, our PECRS (Parameter-Efficient CRS), can be optimized in a single stage, without relying on non-textual metadata such as a knowledge graph. Experiments on two benchmark CRS datasets, ReDial and INSPIRED, demonstrate the effectiveness of PECRS on recommendation and conversation. Our code is available at: https://github.com/Ravoxsg/efficient_unified_crs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.14194

Country:

Europe > Spain (0.14)
North America > United States (0.14)
Asia > China (0.14)

Genre: Research Report (0.40)

Industry:

Media > Film (0.94)
Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

LOCOST: State-Space Models for Long Document Abstractive Summarization

Bronnec, Florian Le, Duong, Song, Ravaut, Mathieu, Allauzen, Alexandre, Chen, Nancy F., Guigue, Vincent, Lumbreras, Alberto, Soulier, Laure, Gallinari, Patrick

arXiv.org Artificial IntelligenceJan-31-2024

State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of $O(L \log L)$, this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns. We evaluate our model on a series of long document abstractive summarization tasks. The model reaches a performance level that is 93-96% comparable to the top-performing sparse transformers of the same size while saving up to 50% memory during training and up to 87% during inference. Additionally, LOCOST effectively handles input texts exceeding 600K tokens at inference time, setting new state-of-the-art results on full-book summarization and opening new perspectives for long input processing.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2401.17919

Country:

Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?

Chen, Hailin, Jiao, Fangkai, Li, Xingxuan, Qin, Chengwei, Ravaut, Mathieu, Zhao, Ruochen, Xiong, Caiming, Joty, Shafiq

arXiv.org Artificial IntelligenceJan-15-2024

Upon its release in late 2022, ChatGPT has brought a seismic shift in the entire landscape of AI, both in research and commerce. Through instruction-tuning a large language model (LLM) with supervised fine-tuning and reinforcement learning from human feedback, it showed that a model could answer human questions and follow instructions on a broad panel of tasks. Following this success, interests in LLMs have intensified, with new LLMs flourishing at frequent interval across academia and industry, including many start-ups focused on LLMs. While closed-source LLMs (e.g., OpenAI's GPT, Anthropic's Claude) generally outperform their open-source counterparts, the progress on the latter has been rapid with claims of achieving parity or even better on certain tasks. This has crucial implications not only on research but also on business. In this work, on the first anniversary of ChatGPT, we provide an exhaustive overview of this success, surveying all tasks where an open-source LLM has claimed to be on par or better than ChatGPT.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2311.16989

Country:

Asia (0.46)
North America > United States (0.14)

Genre: Research Report (0.81)

Industry:

Education (0.93)
Information Technology (0.67)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

On Context Utilization in Summarization with Large Language Models

Ravaut, Mathieu, Joty, Shafiq, Sun, Aixin, Chen, Nancy F.

arXiv.org Artificial IntelligenceNov-30-2023

Large language models (LLMs) excel in zero-shot abstractive summarization tasks, delivering fluent and pertinent summaries. Recent advancements have extended their capabilities to handle long-input contexts, surpassing token limits of 100k. However, in the realm of multi-document question answering, language models exhibit uneven utilization of their input context. They tend to favor the initial and final segments, resulting in a U-shaped performance pattern concerning where the answer is located within the input. This bias raises concerns, particularly in summarization tasks where crucial content may be dispersed throughout the source document(s). This paper presents a comprehensive investigation encompassing 10 datasets, 5 LLMs, and 5 evaluation metrics to analyze how these models leverage their input for abstractive summarization. Our findings reveal a pronounced bias towards the introductory content (and to a lesser extent, the final content), posing challenges for LLM performance across a range of diverse summarization benchmarks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.1057

Country:

Asia (1.00)
Europe (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

PromptSum: Parameter-Efficient Controllable Abstractive Summarization

Ravaut, Mathieu, Chen, Hailin, Zhao, Ruochen, Qin, Chengwei, Joty, Shafiq, Chen, Nancy

arXiv.org Artificial IntelligenceAug-6-2023

Prompt tuning (PT), a parameter-efficient technique that only tunes the additional prompt embeddings while keeping the backbone pre-trained language model (PLM) frozen, has shown promising results in language understanding tasks, especially in low-resource scenarios. However, effective prompt design methods suitable for generation tasks such as summarization are still lacking. At the same time, summarization guided through instructions (discrete prompts) can achieve a desirable double objective of high quality and controllability in summary generation. Towards a goal of strong summarization performance under the triple conditions of parameter-efficiency, data-efficiency, and controllability, we introduce PromptSum, a method combining PT with a multi-task objective and discrete entity prompts for abstractive summarization. Our model achieves competitive ROUGE results on popular abstractive summarization benchmarks coupled with a strong level of controllability through entities, all while only tuning several orders of magnitude less parameters.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2308.03117

Country: North America > United States > Michigan (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports > Motorsports > Formula One (1.00)
Government > Regional Government > North America Government > United States Government (0.92)
Automobiles & Trucks > Manufacturer (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Data-centric Framework for Improving Domain-specific Machine Reading Comprehension Datasets

Bojic, Iva, Halim, Josef, Suharman, Verena, Tar, Sreeja, Ong, Qi Chwen, Phung, Duy, Ravaut, Mathieu, Joty, Shafiq, Car, Josip

arXiv.org Artificial IntelligenceMay-26-2023

Low-quality data can cause downstream problems in high-stakes applications. Data-centric approach emphasizes on improving dataset quality to enhance model performance. High-quality datasets are needed for general-purpose Large Language Models (LLMs) training, as well as for domain-specific models, which are usually small in size as it is costly to engage a large number of domain experts for their creation. Thus, it is vital to ensure high-quality domain-specific training data. In this paper, we propose a framework for enhancing the data quality of original datasets. We applied the proposed framework to four biomedical datasets and showed relative improvement of up to 33%/40% for fine-tuning of retrieval/reader models on the BioASQ dataset when using back translation to enhance the original dataset quality.

domain-specific machine, large language model, natural language, (3 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2023.insights-1.3

2304.00483

Genre: Research Report (0.40)

Industry: Education > Assessment & Standards > Student Performance (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization

Ravaut, Mathieu, Joty, Shafiq, Chen, Nancy F.

arXiv.org Artificial IntelligenceMay-26-2023

Sequence-to-sequence neural networks have recently achieved great success in abstractive summarization, especially through fine-tuning large pre-trained language models on the downstream dataset. These models are typically decoded with beam search to generate a unique summary. However, the search space is very large, and with the exposure bias, such decoding is not optimal. In this paper, we show that it is possible to directly train a second-stage model performing re-ranking on a set of summary candidates. Our mixture-of-experts SummaReranker learns to select a better candidate and consistently improves the performance of the base model. With a base PEGASUS, we push ROUGE scores by 5.44% on CNN-DailyMail (47.16 ROUGE-1), 1.31% on XSum (48.12 ROUGE-1) and 9.34% on Reddit TIFU (29.83 ROUGE-1), reaching a new state-of-the-art. Our code and checkpoints will be available at https://github.com/ntunlp/SummaReranker.

machine learning, natural language, re-ranking summareranker score, (18 more...)

arXiv.org Artificial Intelligence

2203.06569

Country:

Europe (1.00)
Asia > Middle East > Republic of Türkiye (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > Asia Government > Middle East Government > Republic of Türkiye Government (0.93)
Leisure & Entertainment > Sports > Soccer (0.70)

Technology:

Information Technology > Communications > Social Media (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.58)

Add feedback

Unsupervised Summarization Re-ranking

Ravaut, Mathieu, Joty, Shafiq, Chen, Nancy

arXiv.org Artificial IntelligenceMay-26-2023

With the rise of task-specific pre-training objectives, abstractive summarization models like PEGASUS offer appealing zero-shot performance on downstream summarization tasks. However, the performance of such unsupervised models still lags significantly behind their supervised counterparts. Similarly to the supervised setup, we notice a very high variance in quality among summary candidates from these models while only one candidate is kept as the summary output. In this paper, we propose to re-rank summary candidates in an unsupervised manner, aiming to close the performance gap between unsupervised and supervised models. Our approach improves the unsupervised PEGASUS by up to 7.27% and ChatGPT by up to 6.86% relative mean ROUGE across four widely-adopted summarization benchmarks ; and achieves relative gains of 7.51% (up to 23.73% from XSum to WikiHow) averaged over 30 zero-shot transfer setups (finetuning on a dataset, evaluating on another).

machine learning, natural language, score mean rouge, (17 more...)

arXiv.org Artificial Intelligence

2212.09593

Country:

Europe (1.00)
Asia > Middle East > Republic of Türkiye (0.92)
Asia > Myanmar > Yangon Region > Yangon (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report (1.00)
Personal (0.93)

Industry:

Law (1.00)
Health & Medicine (1.00)
Government > Voting & Elections (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback