Impact of Phonetics on Speaker Identity in Adversarial Voice Attack
Dar, Daniyal Kabir, Yan, Qiben, Xiao, Li, Ross, Arun
–arXiv.org Artificial Intelligence
Abstract--Adversarial perturbations in speech pose a serious threat to automatic speech recognition (ASR) and speaker verification by introducing subtle waveform modifications that remain imperceptible to humans but can significantly alter system outputs. While targeted attacks on end-to-end ASR models have been widely studied, the phonetic basis of these perturbations and their effect on speaker identity remain underexplored. In this work, we analyze adversarial audio at the phonetic level and show that perturbations are associated with systematic phonetic tendencies, such as vowel centralization and consonant substitutions. Using the DeepSpeech ASR model as our target, we generate targeted adversarial examples and evaluate their impact on speaker identity embeddings across genuine and impostor samples. Results across 16 phonetically diverse target phrases demonstrate that adversarial audio induces both transcription errors and identity drift, highlighting the need for phonetic-aware defenses to ensure the robustness of ASR and speaker recognition systems.
arXiv.org Artificial Intelligence
Sep-22-2025
- Country:
- North America > United States (0.46)
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: