SCDF: A Speaker Characteristics DeepFake Speech Dataset for Bias Analysis

Staněk, Vojtěch, Srna, Karel, Firc, Anton, Malinka, Kamil

Aug-12-2025–arXiv.org Artificial Intelligence

--Despite growing attention to deepfake speech detection, the aspects of bias and fairness remain underexplored in the speech domain. T o address this gap, we introduce the Speaker Characteristics Deepfake (SCDF) dataset: a novel, richly annotated resource enabling systematic evaluation of demographic biases in deepfake speech detection. SCDF contains over 237,000 utterances in a balanced representation of both male and female speakers spanning five languages and a wide age range. We evaluate several state-of-the-art detectors and show that speaker characteristics significantly influence detection performance, revealing disparities across sex, language, age, and synthesizer type. These findings highlight the need for bias-aware development and provide a foundation for building non-discriminatory deepfake detection systems aligned with ethical and regulatory standards.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Aug-12-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Czechia (0.29)

Genre:
- Research Report (1.00)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Machine Learning > Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found