Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits
Huang, Sung-Feng, Kuo, Heng-Cheng, Chen, Zhehuai, Yang, Xuesong, Yang, Chao-Han Huck, Tsao, Yu, Wang, Yu-Chiang Frank, Lee, Hung-yi, Fu, Szu-Wei
–arXiv.org Artificial Intelligence
Neural speech editing advancements have raised concerns about their misuse in spoofing attacks. Traditional partially edited speech corpora primarily focus on cut-and-paste edits, which, while maintaining speaker consistency, often introduce detectable discontinuities. Recent methods, like A\textsuperscript{3}T and Voicebox, improve transitions by leveraging contextual information. To foster spoofing detection research, we introduce the Speech INfilling Edit (SINE) dataset, created with Voicebox. We detailed the process of re-implementing Voicebox training and dataset creation. Subjective evaluations confirm that speech edited using this novel technique is more challenging to detect than conventional cut-and-paste methods. Despite human difficulty, experimental results demonstrate that self-supervised-based detectors can achieve remarkable performance in detection, localization, and generalization across different edit methods. The dataset and related models will be made publicly available.
arXiv.org Artificial Intelligence
Jan-7-2025
- Country:
- Asia (0.14)
- Genre:
- Research Report > New Finding (0.34)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning > Neural Networks (0.71)
- Natural Language (1.00)
- Speech (1.00)
- Security & Privacy (1.00)
- Artificial Intelligence
- Information Technology