Towards Generalized Source Tracing for Codec-Based Deepfake Speech

Chen, Xuanjun, Lin, I-Ming, Zhang, Lin, Wu, Haibin, Lee, Hung-yi, Jang, Jyh-Shing Roger

Aug-19-2025–arXiv.org Artificial Intelligence

--Recent attempts at source tracing for codec-based deepfake speech (CodecFake), generated by neural audio codec-based speech generation (CoSG) models, have exhibited suboptimal performance. However, how to train source tracing models using simulated CoSG data while maintaining strong performance on real CoSG-generated audio remains an open challenge. In this paper, we show that models trained solely on codec-resynthesized data tend to overfit to non-speech regions and struggle to generalize to unseen content. T o mitigate these challenges, we introduce the Semantic-Acoustic Source Tracing Network (SASTNet), which jointly leverages Whisper for semantic feature encoding and Wav2vec2 with AudioMAE for acoustic feature encoding. Our proposed SASTNet achieves state-of-the-art performance on the CoSG test set of CodecF ake+dataset, demonstrating its effectiveness for reliable source tracing. Deepfake detection determines whether the given speech is a bona fide speech or a deepfake speech. Recently, attention has shifted from merely detecting deepfake speech to tracing its source.

artificial intelligence, arxiv preprint arxiv, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Aug-19-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.46)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Machine Learning > Neural Networks (1.00)