Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking

Chung, Yi-Ling, Cobo, Aurora, Serna, Pablo

Feb-21-2025–arXiv.org Artificial Intelligence

Robust automatic fact-checking systems have the potential to combat online misinformation at scale. However, most existing research primarily focuses on English. In this paper, we introduce MultiSynFact, the first large-scale multilingual fact-checking dataset containing 2.2M claim-source pairs designed to support Spanish, German, English, and other low-resource languages. Our dataset generation pipeline leverages Large Language Models (LLMs), integrating external knowledge from Wikipedia and incorporating rigorous claim validation steps to ensure data quality. We evaluate the effectiveness of MultiSynFact across multiple models and experimental settings. Additionally, we open-source a user-friendly framework to facilitate further research in multilingual fact-checking and dataset generation.

computational linguistic, dataset, synthetic data, (15 more...)

arXiv.org Artificial Intelligence

Feb-21-2025

arXiv.org PDF

Add feedback

Country:
- South America > Colombia (0.04)
- North America
  - United States
    - Michigan > Washtenaw County
      - Ann Arbor (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Florida > Miami-Dade County
      - Miami (0.04)
    - Colorado > Denver County
      - Denver (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - United Kingdom (0.04)
  - Netherlands (0.04)
  - Sweden > Östergötland County
    - Linköping (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Iceland > Capital Region
    - Reykjavik (0.04)
- Asia
  - Singapore (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)

Genre:
- Research Report (1.00)

Industry:
- Media > News (0.48)
- Health & Medicine (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found