Goto

Collaborating Authors

 Tanguy, Ludovic


Typologie des comportements utilisateurs : {\'e}tude exploratoire des sessions de recherche complexe sur le Web

arXiv.org Artificial Intelligence

In this study, we propose an exploratory approach aiming at a typology of user behaviour during a Web search session. We describe a typology based on generic IR variables (e.g. number of queries), but also on the study of topic (propositions with distinct semantic content defined from the search statement). To this end, we gathered experimental data enabling us to study variations across users (N=70) for the same task. We performed a multidimensional analysis and propose a 5 classes typology based on the individual behaviours during the processing of a complex search task.


Talking to oneself in CMC: a study of self replies in Wikipedia talk pages

arXiv.org Artificial Intelligence

This study proposes a qualitative analysis of self replies in Wikipedia talk pages, more precisely when the first two messages of a discussion are written by the same user. This specific pattern occurs in more than 10% of threads with two messages or more and can be explained by a number of reasons. After a first examination of the lexical specificities of second messages, we propose a seven categories typology and use it to annotate two reference samples (English and French) of 100 threads each. Finally, we analyse and compare the performance of human annotators (who reach a reasonable global efficiency) and instruction-tuned LLMs (which encounter important difficulties with several categories).


BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

arXiv.org Artificial Intelligence

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.