Goto

Collaborating Authors

 es textuelle


Un discours et un public "Gilets Jaunes" au coeur du Grand D\'ebat National? Combinaison des approches IA et textom\'etriques pour l'analyse de discours des plateformes "Grand D\'ebat National" et "Vrai d\'ebat"

Philippe, Suignard

arXiv.org Artificial Intelligence

In this contribution, we propose to analyze the statements coming from two ''civic tech'' platforms-the governmental platform, ''Grand D{\'e}bat National'' and, its political and algorithmic response proposed by a Yellow Vest collective, ''Vrai D{\'e}bat''-, by confronting two families of algorithms dedicated to text analysis. We propose to implement, on the one hand, proven approaches in textual data analysis (Reinert/Iramuteq Method) which have recently shown their interest in the analysis of very large corpora and, on the other hand, new methods resulting from the crossroads of the computer worlds, artificial intelligence and automatic language processing. We will examine the methodological solutions for qualifying the social properties of speakers about whom we have little direct information. Finally, we will attempt to present some research questions at the crossroads of the political sociology of public opinion and data science, which such a confrontation opens up.


Approches quantitatives de l'analyse des pr{\'e}dictions en traduction automatique neuronale (TAN)

Zimina-Poirot, Maria, Ballier, Nicolas, Yunès, Jean-Baptiste

arXiv.org Artificial Intelligence

As part of a larger project on optimal learning conditions in neural machine translation, we investigate characteristic training phases of translation engines. All our experiments are carried out using OpenNMT-Py: the pre-processing step is implemented using the Europarl training corpus and the INTERSECT corpus is used for validation. Longitudinal analyses of training phases suggest that the progression of translations is not always linear. Following the results of textometric explorations, we identify the importance of the phenomena related to chronological progression, in order to map different processes at work in neural machine translation (NMT).


Building and displaying name relations using automatic unsupervised analysis of newspaper articles

Pouliquen, Bruno, Steinberger, Ralf, Ignat, Camelia, Oellinger, Tamara

arXiv.org Artificial Intelligence

We present a tool that, from automatically recognised names, tries to infer inter-person relations in order to present associated people on maps. Based on an in-house Named Entity Recognition tool, applied on clusters of an average of 15,000 news articles per day, in 15 different languages, we build a knowledge base that allows extracting statistical co-occurrences of persons and visualising them on a per-person page or in various graphs.


Classification dynamique d'un flux documentaire : une \'evaluation statique pr\'ealable de l'algorithme GERMEN

Lelu, Alain, Cuxac, Pascal, Johansson, Joel

arXiv.org Artificial Intelligence

Data-stream clustering is an ever-expanding subdomain of knowledge extraction. Most of the past and present research effort aims at efficient scaling up for the huge data repositories. Our approach focuses on qualitative improvement, mainly for "weak signals" detection and precise tracking of topical evolutions in the framework of information watch - though scalability is intrinsically guaranteed in a possibly distributed implementation. Our GERMEN algorithm exhaustively picks up the whole set of density peaks of the data at time t, by identifying the local perturbations induced by the current document vector, such as changing cluster borders, or new/vanishing clusters. Optimality yields from the uniqueness 1) of the density landscape for any value of our zoom parameter, 2) of the cluster allocation operated by our border propagation rule.