SCDF: A Speaker Characteristics DeepFake Speech Dataset for Bias Analysis
Staněk, Vojtěch, Srna, Karel, Firc, Anton, Malinka, Kamil
–arXiv.org Artificial Intelligence
--Despite growing attention to deepfake speech detection, the aspects of bias and fairness remain underexplored in the speech domain. T o address this gap, we introduce the Speaker Characteristics Deepfake (SCDF) dataset: a novel, richly annotated resource enabling systematic evaluation of demographic biases in deepfake speech detection. SCDF contains over 237,000 utterances in a balanced representation of both male and female speakers spanning five languages and a wide age range. We evaluate several state-of-the-art detectors and show that speaker characteristics significantly influence detection performance, revealing disparities across sex, language, age, and synthesizer type. These findings highlight the need for bias-aware development and provide a foundation for building non-discriminatory deepfake detection systems aligned with ethical and regulatory standards.
arXiv.org Artificial Intelligence
Aug-12-2025
- Country:
- Europe
- Czechia
- Olomouc Region > Olomouc (0.04)
- South Moravian Region > Brno (0.05)
- Switzerland (0.04)
- Czechia
- Oceania > Australia
- Europe
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: