Annotation Tool and Dataset for Fact-Checking Podcasts
Setty, Vinay, Becker, Adam James
–arXiv.org Artificial Intelligence
Podcasts are a popular medium on the web, featuring diverse and multilingual content that often includes unverified claims. Fact-checking podcasts is a challenging task, requiring transcription, annotation, and claim verification, all while preserving the contextual details of spoken content. Our tool offers a novel approach to tackle these challenges by enabling real-time annotation of podcasts during playback. This unique capability allows users to listen to the podcast and annotate key elements, such as check-worthy claims, claim spans, and contextual errors, simultaneously. By integrating advanced transcription models like OpenAI's Whisper and leveraging crowdsourced annotations, we create high-quality datasets to fine-tune multilingual transformer models such as XLM-RoBERTa for tasks like claim detection and stance classification. Furthermore, we release the annotated podcast transcripts and sample annotations with preliminary experiments.
arXiv.org Artificial Intelligence
Feb-3-2025
- Country:
- Europe > Norway (0.16)
- North America > United States (0.14)
- Genre:
- Research Report (0.70)
- Industry:
- Media (0.48)
- Technology: