Can Emotion Fool Anti-spoofing?
Mahapatra, Aurosweta, Ulgen, Ismail Rasim, Naini, Abinay Reddy, Busso, Carlos, Sisman, Berrak
–arXiv.org Artificial Intelligence
Traditional anti-spoofing focuses on models and datasets built on synthetic speech with mostly neutral state, neglecting diverse emotional variations. As a result, their robustness against high-quality, emotionally expressive synthetic speech is uncertain. We address this by introducing EmoSpoof-TTS, a corpus of emotional text-to-speech samples. Our analysis shows existing anti-spoofing models struggle with emotional synthetic speech, exposing risks of emotion-targeted attacks. Even trained on emotional data, the models underperform due to limited focus on emotional aspect and show performance disparities across emotions. This highlights the need for emotion-focused anti-spoofing paradigm in both dataset and methodology. We propose GEM, a gated ensemble of emotion-specialized models with a speech emotion recognition gating network. GEM performs effectively across all emotions and neutral state, improving defenses against spoofing attacks.
arXiv.org Artificial Intelligence
Jun-2-2025
- Country:
- North America > United States (0.15)
- Genre:
- Research Report > New Finding (0.93)
- Industry:
- Information Technology > Security & Privacy (0.49)
- Technology:
- Information Technology > Artificial Intelligence
- Natural Language (1.00)
- Cognitive Science > Emotion (0.67)
- Speech > Speech Synthesis (0.51)
- Machine Learning > Neural Networks (0.47)
- Information Technology > Artificial Intelligence