Multilingual Dataset Integration Strategies for Robust Audio Deepfake Detection: A SAFE Challenge System
Ali, Hashim, Subramani, Surya, Bollinani, Lekha, Adupa, Nithin Sai, El-Loh, Sali, Malik, Hafiz
–arXiv.org Artificial Intelligence
The SAFE Challenge evaluates synthetic speech detection across three tasks: unmodified audio, processed audio with compression artifacts, and laundered audio designed to evade detection. We systematically explore self-supervised learning (SSL) front-ends, training data compositions, and audio length configurations for robust deepfake detection. Our AASIST-based approach incorporates WavLM large frontend with RawBoost augmentation, trained on a multilingual dataset of 256,600 samples spanning 9 languages and over 70 TTS systems from CodecFake, MLAAD v5, SpoofCeleb, Famous Figures, and MAILABS. Through extensive experimentation with different SSL front-ends, three training data versions, and two audio lengths, we achieved second place in both Task 1 (unmodified audio detection) and Task 3 (laundered audio detection), demonstrating strong generalization and robustness.
arXiv.org Artificial Intelligence
Oct-8-2025
- Country:
- Asia > China
- Europe
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- United Kingdom (0.14)
- Italy > Calabria
- North America > United States
- Michigan > Wayne County > Dearborn (0.04)
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: