es textuelle
Un discours et un public "Gilets Jaunes" au coeur du Grand D\'ebat National? Combinaison des approches IA et textom\'etriques pour l'analyse de discours des plateformes "Grand D\'ebat National" et "Vrai d\'ebat"
In this contribution, we propose to analyze the statements coming from two ''civic tech'' platforms-the governmental platform, ''Grand D{\'e}bat National'' and, its political and algorithmic response proposed by a Yellow Vest collective, ''Vrai D{\'e}bat''-, by confronting two families of algorithms dedicated to text analysis. We propose to implement, on the one hand, proven approaches in textual data analysis (Reinert/Iramuteq Method) which have recently shown their interest in the analysis of very large corpora and, on the other hand, new methods resulting from the crossroads of the computer worlds, artificial intelligence and automatic language processing. We will examine the methodological solutions for qualifying the social properties of speakers about whom we have little direct information. Finally, we will attempt to present some research questions at the crossroads of the political sociology of public opinion and data science, which such a confrontation opens up.
Approches quantitatives de l'analyse des pr{\'e}dictions en traduction automatique neuronale (TAN)
Zimina-Poirot, Maria, Ballier, Nicolas, Yunès, Jean-Baptiste
As part of a larger project on optimal learning conditions in neural machine translation, we investigate characteristic training phases of translation engines. All our experiments are carried out using OpenNMT-Py: the pre-processing step is implemented using the Europarl training corpus and the INTERSECT corpus is used for validation. Longitudinal analyses of training phases suggest that the progression of translations is not always linear. Following the results of textometric explorations, we identify the importance of the phenomena related to chronological progression, in order to map different processes at work in neural machine translation (NMT).
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- Europe > Belgium (0.04)
- (2 more...)
Building and displaying name relations using automatic unsupervised analysis of newspaper articles
Pouliquen, Bruno, Steinberger, Ralf, Ignat, Camelia, Oellinger, Tamara
We present a tool that, from automatically recognised names, tries to infer inter-person relations in order to present associated people on maps. Based on an in-house Named Entity Recognition tool, applied on clusters of an average of 15,000 news articles per day, in 15 different languages, we build a knowledge base that allows extracting statistical co-occurrences of persons and visualising them on a per-person page or in various graphs.
- North America > United States > Connecticut > New Haven County > Cheshire (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (3 more...)
- Media > News (0.83)
- Leisure & Entertainment > Sports (0.68)
- Government > Regional Government > North America Government > United States Government (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)
Classification dynamique d'un flux documentaire : une \'evaluation statique pr\'ealable de l'algorithme GERMEN
Lelu, Alain, Cuxac, Pascal, Johansson, Joel
Data-stream clustering is an ever-expanding subdomain of knowledge extraction. Most of the past and present research effort aims at efficient scaling up for the huge data repositories. Our approach focuses on qualitative improvement, mainly for "weak signals" detection and precise tracking of topical evolutions in the framework of information watch - though scalability is intrinsically guaranteed in a possibly distributed implementation. Our GERMEN algorithm exhaustively picks up the whole set of density peaks of the data at time t, by identifying the local perturbations induced by the current document vector, such as changing cluster borders, or new/vanishing clusters. Optimality yields from the uniqueness 1) of the density landscape for any value of our zoom parameter, 2) of the cluster allocation operated by our border propagation rule.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > San Mateo County > Menlo Park (0.04)
- Europe > United Kingdom > England > East Sussex > Brighton (0.04)
- (4 more...)