HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims
van der Meer, Michiel, Korshunov, Pavel, Marcel, Sébastien, van der Plas, Lonneke
–arXiv.org Artificial Intelligence
Misinformation can be countered with fact-checking, but the process is costly and slow. Identifying checkworthy claims is the first step, where automation can help scale fact-checkers' efforts. However, detection methods struggle with content that is 1) multimodal, 2) from diverse domains, and 3) synthetic. We introduce HintsOfTruth, a public dataset for multimodal checkworthiness detection with $27$K real-world and synthetic image/claim pairs. The mix of real and synthetic data makes this dataset unique and ideal for benchmarking detection methods. We compare fine-tuned and prompted Large Language Models (LLMs). We find that well-configured lightweight text-based encoders perform comparably to multimodal models but the first only focus on identifying non-claim-like content. Multimodal LLMs can be more accurate but come at a significant computational cost, making them impractical for large-scale applications. When faced with synthetic data, multimodal models perform more robustly
arXiv.org Artificial Intelligence
Feb-17-2025
- Country:
- Africa
- Kenya (0.04)
- Nigeria (0.04)
- South Sudan (0.04)
- Asia
- Europe
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Germany (0.04)
- Greece > Central Macedonia
- Thessaloniki (0.04)
- Netherlands > South Holland
- Leiden (0.04)
- Switzerland (0.04)
- France > Provence-Alpes-Côte d'Azur
- North America
- Canada
- Ontario > National Capital Region
- Ottawa (0.04)
- Quebec > Montreal (0.04)
- Ontario > National Capital Region
- Dominican Republic (0.04)
- Mexico (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.04)
- Virginia (0.04)
- Florida > Miami-Dade County
- Canada
- Oceania > Australia
- Africa
- Genre:
- Research Report > New Finding (0.93)
- Industry:
- Government > Regional Government
- Media > News (1.00)
- Technology: