AITopics | Peyrard, Maxime

Plotting

Peyrard, Maxime

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

REFINER: Reasoning Feedback on Intermediate Representations

Paul, Debjit, Ismayilzada, Mete, Peyrard, Maxime, Borges, Beatriz, Bosselut, Antoine, West, Robert, Faltings, Boi

arXiv.org Artificial IntelligenceApr-4-2023

Language models (LMs) have recently shown remarkable performance on reasoning tasks by explicitly generating intermediate inferences, e.g., chain-of-thought prompting. However, these intermediate inference steps may be inappropriate deductions from the initial context and lead to incorrect final predictions. Here we introduce REFINER, a framework for finetuning LMs to explicitly generate intermediate reasoning steps while interacting with a critic model that provides automated feedback on the reasoning. Specifically, the critic provides structured feedback that the reasoning LM uses to iteratively improve its intermediate arguments. Empirical evaluations of REFINER on three diverse reasoning tasks show significant improvements over baseline LMs of comparable scale. Furthermore, when using GPT3.5 as the reasoner, the trained critic significantly improves reasoning without finetuning the reasoner. Finally, our critic model is trained without expensive human-in-the-loop data but can be substituted with humans at inference time.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2304.01904

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.66)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Abductive Reasoning (0.48)

Add feedback

Language Model Decoding as Likelihood-Utility Alignment

Josifoski, Martin, Peyrard, Maxime, Rajic, Frano, Wei, Jiheng, Paul, Debjit, Hartmann, Valentin, Patra, Barun, Chaudhary, Vishrav, Kıcıman, Emre, Faltings, Boi, West, Robert

arXiv.org Artificial IntelligenceMar-16-2023

A critical component of a successful language generation pipeline is the decoding algorithm. However, the general principles that should guide the choice of a decoding algorithm remain unclear. Previous works only compare decoding algorithms in narrow scenarios, and their findings do not generalize across tasks. We argue that the misalignment between the model's likelihood and the task-specific notion of utility is the key factor to understanding the effectiveness of decoding algorithms. To structure the discussion, we introduce a taxonomy of misalignment mitigation strategies (MMSs), providing a unifying view of decoding as a tool for alignment. The MMS taxonomy groups decoding algorithms based on their implicit assumptions about likelihood--utility misalignment, yielding general statements about their applicability across tasks. Specifically, by analyzing the correlation between the likelihood and the utility of predictions across a diverse set of tasks, we provide empirical evidence supporting the proposed taxonomy and a set of principles to structure reasoning when choosing a decoding algorithm. Crucially, our analysis is the first to relate likelihood-based decoding algorithms with algorithms that rely on external information, such as value-guided methods and prompting, and covers the most diverse set of tasks to date. Code, data, and models are available at https://github.com/epfl-dlab/understanding-decoding.

language model decoding, likelihood-utility alignment

arXiv.org Artificial Intelligence

2210.07228

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.69)

Add feedback

Descartes: Generating Short Descriptions of Wikipedia Articles

Sakota, Marija, Peyrard, Maxime, West, Robert

arXiv.org Artificial IntelligenceFeb-17-2023

Wikipedia is one of the richest knowledge sources on the Web today. In order to facilitate navigating, searching, and maintaining its content, Wikipedia's guidelines state that all articles should be annotated with a so-called short description indicating the article's topic (e.g., the short description of beer is "Alcoholic drink made from fermented cereal grains"). Nonetheless, a large fraction of articles (ranging from 10.2% in Dutch to 99.7% in Kazakh) have no short description yet, with detrimental effects for millions of Wikipedia users. Motivated by this problem, we introduce the novel task of automatically generating short descriptions for Wikipedia articles and propose Descartes, a multilingual model for tackling it. Descartes integrates three sources of information to generate an article description in a target language: the text of the article in all its language versions, the already-existing descriptions (if any) of the article in other languages, and semantic type information obtained from a knowledge graph. We evaluate a Descartes model trained for handling 25 languages simultaneously, showing that it beats baselines (including a strong translation-based baseline) and performs on par with monolingual models tailored for specific languages. A human evaluation on three languages further shows that the quality of Descartes's descriptions is largely indistinguishable from that of human-written descriptions; e.g., 91.3% of our English descriptions (vs. 92.1% of human-written descriptions) pass the bar for inclusion in Wikipedia, suggesting that Descartes is ready for production, with the potential to support human editors in filling a major gap in today's Wikipedia across languages.

artificial intelligence, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3543507.3583220

2205.10012

Country:

Europe (1.00)
Asia > Middle East > Bahrain (0.28)
Asia > Japan > Honshū (0.28)
(4 more...)

Genre: Research Report (0.82)

Industry:

Government (0.93)
Health & Medicine (0.93)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

GenIE: Generative Information Extraction

Josifoski, Martin, De Cao, Nicola, Peyrard, Maxime, West, Robert

arXiv.org Machine LearningDec-15-2021

Structured and grounded representation of text is typically formalized by closed information extraction, the problem of extracting an exhaustive set of (subject, relation, object) triplets that are consistent with a predefined set of entities and relations from a knowledge base schema. Most existing works are pipelines prone to error accumulation, and all approaches are only applicable to unrealistically small numbers of entities and relations. We introduce GenIE (generative information extraction), the first end-to-end autoregressive formulation of closed information extraction. GenIE naturally exploits the language knowledge from the pre-trained transformer by autoregressively generating relations and entities in textual form. Thanks to a new bi-level constrained generation strategy, only triplets consistent with the predefined knowledge base schema are produced. Our experiments show that GenIE is state-of-the-art on closed information extraction, generalizes from fewer training data points than baselines, and scales to a previously unmanageable number of entities and relations. With this work, closed information extraction becomes practical in realistic scenarios, providing new opportunities for downstream tasks. Finally, this work paves the way towards a unified end-to-end approach to the core tasks of information extraction. Code and models available at https://github.com/epfl-dlab/GenIE.

data mining, natural language, relation, (18 more...)

arXiv.org Machine Learning

2112.0834

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science > Data Mining > Text Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)

Add feedback

Better than Average: Paired Evaluation of NLP Systems

Peyrard, Maxime, Zhao, Wei, Eger, Steffen, West, Robert

arXiv.org Artificial IntelligenceOct-20-2021

Evaluation in NLP is usually done by comparing the scores of competing systems independently averaged over a common set of test instances. In this work, we question the use of averages for aggregating evaluation scores into a final number used to decide which system is best, since the average, as well as alternatives such as the median, ignores the pairing arising from the fact that systems are evaluated on the same test instances. We illustrate the importance of taking the instance-level pairing of evaluation scores into account and demonstrate, both theoretically and empirically, the advantages of aggregation methods based on pairwise comparisons, such as the Bradley-Terry (BT) model, a mechanism based on the estimated probability that a given system scores better than another on the test set. By re-evaluating 296 real NLP evaluation setups across four tasks and 18 evaluation metrics, we show that the choice of aggregation mechanism matters and yields different conclusions as to which systems are state of the art in about 30% of the setups. To facilitate the adoption of pairwise evaluation, we release a practical tool for performing the full analysis of evaluation scores with the mean, median, BT, and two variants of BT (Elo and TrueSkill), alongside functionality for appropriate statistical testing.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2021.acl-long.179

2110.10746

Country:

North America > United States > Texas (0.14)
North America > United States > Louisiana (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

Laughing Heads: Can Transformers Detect What Makes a Sentence Funny?

Peyrard, Maxime, Borges, Beatriz, Gligorić, Kristina, West, Robert

arXiv.org Artificial IntelligenceAug-25-2021

The automatic detection of humor poses a grand challenge for natural language processing. Transformer-based systems have recently achieved remarkable results on this task, but they usually (1)~were evaluated in setups where serious vs humorous texts came from entirely different sources, and (2)~focused on benchmarking performance without providing insights into how the models work. We make progress in both respects by training and analyzing transformer-based humor recognition models on a recently introduced dataset consisting of minimal pairs of aligned sentences, one serious, the other humorous. We find that, although our aligned dataset is much harder than previous datasets, transformer-based models recognize the humorous sentence in an aligned pair with high accuracy (78%). In a careful error analysis, we characterize easy vs hard instances. Finally, by analyzing attention weights, we obtain important insights into the mechanisms by which transformers recognize humor. Most remarkably, we find clear evidence that one single attention head learns to recognize the words that make a test sentence humorous, even without access to this information at training time.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2105.09142

Country:

Asia (0.68)
Europe (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Ladder of Causal Distances

Peyrard, Maxime, West, Robert

arXiv.org Artificial IntelligenceMay-5-2020

Causal discovery, the task of automatically constructing a causal model from data, is of major significance across the sciences. Evaluating the performance of causal discovery algorithms should ideally involve comparing the inferred models to ground-truth models available for benchmark datasets, which in turn requires a notion of distance between causal models. While such distances have been proposed previously, they are limited by focusing on graphical properties of the causal models being compared. Here, we overcome this limitation by defining distances derived from the causal distributions induced by the models, rather than exclusively from their graphical structure. Pearl and Mackenzie (2018) have arranged the properties of causal models in a hierarchy called the "ladder of causation" spanning three rungs: observational, interventional, and counterfactual. Following this organization, we introduce a hierarchy of three distances, one for each rung of the ladder. Our definitions are intuitively appealing as well as efficient to compute approximately. We put our causal distances to use by benchmarking standard causal discovery systems on both synthetic and real-world datasets for which ground-truth causal models are available. Finally, we highlight the usefulness of our causal distances by briefly discussing further applications beyond the evaluation of causal discovery techniques.

bayesian inference, causal distance, health & medicine, (18 more...)

arXiv.org Artificial Intelligence

2005.0248

Country:

North America > United States > New York (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)

Add feedback