AITopics | Poibeau, Thierry

Collaborating Authors

Poibeau, Thierry

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Annotating References to Mythological Entities in French Literature

Poibeau, Thierry

arXiv.org Artificial IntelligenceDec-24-2024

In this paper, we explore the relevance of large language models (LLMs) for annotating references to Roman and Greek mythological entities in modern and contemporary French literature. We present an annotation scheme and demonstrate that recent LLMs can be directly applied to follow this scheme effectively, although not without occasionally making significant analytical errors. Additionally, we show that LLMs (and, more specifically, ChatGPT) are capable of offering interpretative insights into the use of mythological references by literary authors. However, we also find that LLMs struggle to accurately identify relevant passages in novels (when used as an information retrieval engine), often hallucinating and generating fabricated examples--an issue that raises significant ethical concerns. Nonetheless, when used carefully, LLMs remain valuable tools for performing annotations with high accuracy, especially for tasks that would be difficult to annotate comprehensively on a large scale through manual methods alone.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.1827

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

An Incremental Clustering Baseline for Event Detection on Twitter

Ray, Marjolaine, Wang, Qi, Mélanie-Becquet, Frédérique, Poibeau, Thierry, Mazoyer, Béatrice

arXiv.org Artificial IntelligenceDec-16-2024

Event detection in text streams is a crucial task for the analysis of online media and social networks. One of the current challenges in this field is establishing a performance standard while maintaining an acceptable level of computational complexity. In our study, we use an incremental clustering algorithm combined with recent advancements in sentence embeddings. Our objective is to compare our findings with previous studies, specifically those by Cao et al. (2024) and Mazoyer et al. (2020). Our results demonstrate significant improvements and could serve as a relevant baseline for future research in this area.

artificial intelligence, detection, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2024.futured-1.2

2412.15257

Country: Europe > France (0.29)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Services (0.67)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Add feedback

How to Evaluate Coreference in Literary Texts?

Duron-Tejedor, Ana-Isabel, Amsili, Pascal, Poibeau, Thierry

arXiv.org Artificial IntelligenceDec-30-2023

In this short paper, we examine the main metrics used to evaluate textual coreference and we detail some of their limitations. We show that a unique score cannot represent the full complexity of the problem at stake, and is thus uninformative, or even misleading. We propose a new way of evaluating coreference, taking into account the context (in our case, the analysis of fictions, esp. novels). More specifically, we propose to distinguish long coreference chains (corresponding to main characters), from short ones (corresponding to secondary characters), and singletons (isolated elements). This way, we hope to get more interpretable and thus more informative results through evaluation.

artificial intelligence, coreference, natural language, (15 more...)

arXiv.org Artificial Intelligence

2401.00238

Country:

Europe (0.94)
North America > United States > Maryland (0.29)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Probing for the Usage of Grammatical Number

Lasri, Karim, Pimentel, Tiago, Lenci, Alessandro, Poibeau, Thierry, Cotterell, Ryan

arXiv.org Artificial IntelligenceJul-31-2023

A central quest of probing is to uncover how pre-trained models encode a linguistic property within their representations. An encoding, however, might be spurious-i.e., the model might not rely on it when making predictions. In this paper, we try to find encodings that the model actually uses, introducing a usage-based probing setup. We first choose a behavioral task which cannot be solved without using the linguistic property. Then, we attempt to remove the property by intervening on the model's representations. We contend that, if an encoding is used by the model, its removal should harm the performance on the chosen behavioral task. As a case study, we focus on how BERT encodes grammatical number, and on how it uses this encoding to solve the number agreement task. Experimentally, we find that BERT relies on a linear encoding of grammatical number to produce the correct behavioral output. We also find that BERT uses a separate encoding of grammatical number for nouns and verbs. Finally, we identify in which layers information about grammatical number is transferred from a noun to its head verb.

information, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2022.acl-long.603

2204.08831

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On the Correspondence between Compositionality and Imitation in Emergent Neural Communication

Cheng, Emily, Rita, Mathieu, Poibeau, Thierry

arXiv.org Artificial IntelligenceMay-22-2023

Compositionality is a hallmark of human language that not only enables linguistic generalization, but also potentially facilitates acquisition. When simulating language emergence with neural networks, compositionality has been shown to improve communication performance; however, its impact on imitation learning has yet to be investigated. Our work explores the link between compositionality and imitation in a Lewis game played by deep neural agents. Our contributions are twofold: first, we show that the learning algorithm used to imitate is crucial: supervised learning tends to produce more average languages, while reinforcement learning introduces a selection pressure toward more compositional languages. Second, our study reveals that compositional languages are easier to imitate, which may induce the pressure toward compositional languages in RL imitation settings.

artificial intelligence, imitation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2305.12941

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

Modern French Poetry Generation with RoBERTa and GPT-2

Hämäläinen, Mika, Alnajjar, Khalid, Poibeau, Thierry

arXiv.org Artificial IntelligenceDec-6-2022

We present a novel neural model for modern poetry generation in French. The model consists of two pretrained neural models that are fine-tuned for the poem generation task. The encoder of the model is a RoBERTa based one while the decoder is based on GPT-2. This way the model can benefit from the superior natural language understanding performance of RoBERTa and the good natural language generation performance of GPT-2. Our evaluation shows that the model can create French poetry successfully. On a 5 point scale, the lowest score of 3.57 was given by human judges to typicality and emotionality of the output poetry while the best score of 3.79 was given to understandability.

artificial intelligence, natural language, poem, (15 more...)

arXiv.org Artificial Intelligence

2212.02911

Country: Europe > France (0.47)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)

Add feedback

Video Games as a Corpus: Sentiment Analysis using Fallout New Vegas Dialog

Hämäläinen, Mika, Alnajjar, Khalid, Poibeau, Thierry

arXiv.org Artificial IntelligenceDec-5-2022

We present a method for extracting a multilingual sentiment annotated dialog data set from Fallout New Vegas. The game developers have preannotated every line of dialog in the game in one of the 8 different sentiments: \textit{anger, disgust, fear, happy, neutral, pained, sad } and \textit{surprised}. The game has been translated into English, Spanish, German, French and Italian. We conduct experiments on multilingual, multilabel sentiment analysis on the extracted data set using multilingual BERT, XLMRoBERTa and language specific BERT models. In our experiments, multilingual BERT outperformed XLMRoBERTa for most of the languages, also language specific models were slightly better than multilingual BERT for most of the languages. The best overall accuracy was 54\% and it was achieved by using multilingual BERT on Spanish data. The extracted data set presents a challenging task for sentiment analysis. We have released the data, including the testing and training splits, openly on Zenodo. The data set has been shuffled for copyright reasons.

artificial intelligence, natural language, sentiment analysis, (14 more...)

arXiv.org Artificial Intelligence

2212.02168

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.70)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Automatic Generation of Factual News Headlines in Finnish

Koppatz, Maximilian, Alnajjar, Khalid, Hämäläinen, Mika, Poibeau, Thierry

arXiv.org Artificial IntelligenceDec-5-2022

We present a novel approach to generating news headlines in Finnish for a given news story. We model this as a summarization task where a model is given a news article, and its task is to produce a concise headline describing the main topic of the article. Because there are no openly available GPT-2 models for Finnish, we will first build such a model using several corpora. The model is then fine-tuned for the headline generation task using a massive news corpus. The system is evaluated by 3 expert journalists working in a Finnish media house. The results showcase the usability of the presented approach as a headline suggestion tool to facilitate the news production process.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2212.0217

Country:

Europe (1.00)
North America > United States (0.68)

Genre: Research Report (1.00)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Acquisition d'informations lexicales \`a partir de corpus C\'edric Messiant et Thierry Poibeau

Messiant, Cédric, Poibeau, Thierry

arXiv.org Artificial IntelligenceNov-30-2009

This paper is about automatic acquisition of lexical information from corpora, especially subcategorization acquisition.

acquisition, artificial intelligence, natural language, (13 more...)

arXiv.org Artificial Intelligence

0911.5568

Country:

North America > United States > Ohio (0.15)
Europe > Spain > Canary Islands > Gran Canaria (0.15)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.31)

Add feedback

A Robust Linguistic Platform for Efficient and Domain specific Web Content Analysis

Hamon, Thierry, Nazarenko, Adeline, Poibeau, Thierry, Aubin, Sophie, Derivière, Julien

arXiv.org Artificial IntelligenceJun-29-2007

Web semantic access in specific domains calls for specialized search engines with enhanced semantic querying and indexing capacities, which pertain both to information retrieval (IR) and to information extraction (IE). A rich linguistic analysis is required either to identify the relevant semantic units to index and weight them according to linguistic specific statistical distribution, or as the basis of an information extraction process. Recent developments make Natural Language Processing (NLP) techniques reliable enough to process large collections of documents and to enrich them with semantic annotations. This paper focuses on the design and the development of a text processing platform, Ogmios, which has been developed in the ALVIS project. The Ogmios platform exploits existing NLP modules and resources, which may be tuned to specific domains and produces linguistically annotated documents. We show how the three constraints of genericity, domain semantic awareness and performance can be handled all together.

platform, survey article, text processing, (21 more...)

arXiv.org Artificial Intelligence

0706.4375

Country:

Europe (1.00)
North America > United States > Maryland (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback