busca
Semi-automated Fact-checking in Portuguese: Corpora Enrichment using Retrieval with Claim extraction
Gomes, Juliana Resplande Sant'anna, Filho, Arlindo Rodrigues Galvão
The accelerated dissemination of disinformation often outpaces the capacity for manual fact-checking, highlighting the urgent need for Semi-Automated Fact-Checking (SAFC) systems. Within the Portuguese language context, there is a noted scarcity of publicly available datasets ( corpora) that integrate external evidence, an essential component for developing robust AFC systems, as many existing resources focus solely on classification based on intrinsic text features. This dissertation addresses this gap by developing, applying, and analyzing a methodology to enrich Portuguese news corpora (Fake.Br, COVID19.BR, MuMiN-PT) with external evidence. The approach simulates a user's verification process, employing Large Language Models (LLMs, specifically Gemini 1.5 Flash) to extract the main claim from texts and search engine APIs (Google Search API, Google FactCheck Claims Search API) to retrieve relevant external documents (evidence). Additionally, a data validation and pre-processing framework, including near-duplicate detection, is introduced to enhance the quality of the base corpora. The main results demonstrate the methodology's viability, providing enriched corpora and analyses that confirm the utility of claim extraction, the influence of original data characteristics on the process, and the positive impact of enrichment on the performance of classification models (Bertimbau and Gemini 1.5 Flash), especially with fine-tuning. This work contributes valuable resources and insights for advancing SAFC in Portuguese.
- South America > Brazil (1.00)
- Asia > Middle East > UAE (0.45)
- North America > United States > Minnesota (0.27)
- Europe > Spain > Galicia (0.27)
- Research Report (0.70)
- Overview (0.67)
- Information Technology > Services (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.94)
- Media > News (0.70)
Busca de melhor caminho entre múltiplas origens e múltiplos destinos em redes complexas que representam cidades
Was investigated in this paper the use of a search strategy in the problem of finding the best path among multiple origins and multiple destinations. In this kind of problem, it must be decided within a lot of combinations which is the best origin and the best destination, and also the best path between these two regions. One remarkable difficulty to answer this sort of problem is to perform the search in a reduced time. This monography is a extension of previous research in which the problem described here was studied only in a bus network in the city of Fortaleza. This extension consisted of an exploration of the search strategy in graphs that represent public ways in cities like Fortaleza, Mumbai and Tokyo.
- South America > Brazil > Ceará > Fortaleza (0.55)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.30)
- Asia > India > Maharashtra > Mumbai (0.30)
Busca de melhor caminho entre m\'ultiplas origens e m\'ultiplos destinos em redes complexas que representam cidades
Was investigated in this paper the use of a search strategy in the problem of finding the best path among multiple origins and multiple destinations. In this kind of problem, it must be decided within a lot of combinations which is the best origin and the best destination, and also the best path between these two regions. One remarkable difficulty to answer this sort of problem is to perform the search in a reduced time. This monography is a extension of previous research in which the problem described here was studied only in a bus network in the city of Fortaleza. This extension consisted of an exploration of the search strategy in graphs that represent public ways in cities like Fortaleza, Mumbai and Tokyo. Using this strategy with a heuristic algorithm, Haversine distance, was noticed that is possible to reduce substantially the time of the search, but introducing an error because of the loss of the admissible characteristic of the heuristic function applied.
- South America > Brazil > Ceará > Fortaleza (0.46)
- Asia > India > Maharashtra > Mumbai (0.24)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.24)
- (6 more...)
Burstiness Scale: a highly parsimonious model for characterizing random series of events
Alves, Rodrigo A S, Assunção, Renato, de Melo, Pedro O S Vaz
The problem to accurately and parsimoniously characterize random series of events (RSEs) present in the Web, such as e-mail conversations or Twitter hashtags, is not trivial. Reports found in the literature reveal two apparent conflicting visions of how RSEs should be modeled. From one side, the Poissonian processes, of which consecutive events follow each other at a relatively regular time and should not be correlated. On the other side, the self-exciting processes, which are able to generate bursts of correlated events and periods of inactivities. The existence of many and sometimes conflicting approaches to model RSEs is a consequence of the unpredictability of the aggregated dynamics of our individual and routine activities, which sometimes show simple patterns, but sometimes results in irregular rising and falling trends. In this paper we propose a highly parsimonious way to characterize general RSEs, namely the Burstiness Scale (BuSca) model. BuSca views each RSE as a mix of two independent process: a Poissonian and a self-exciting one. Here we describe a fast method to extract the two parameters of BuSca that, together, gives the burstyness scale, which represents how much of the RSE is due to bursty and viral effects. We validated our method in eight diverse and large datasets containing real random series of events seen in Twitter, Yelp, e-mail conversations, Digg, and online forums. Results showed that, even using only two parameters, BuSca is able to accurately describe RSEs seen in these diverse systems, what can leverage many applications.
- North America > United States > New York > New York County > New York City (0.16)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Minnesota (0.04)