AITopics | deepfake audio

Collaborating Authors

deepfake audio

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fake Speech Wild: Detecting Deepfake Speech on Social Media Platform

Xie, Yuankun, Fu, Ruibo, Wang, Xiaopeng, Wang, Zhiyong, Li, Ya, Wen, Zhengqi, Cheng, Haonnan, Ye, Long

arXiv.org Artificial IntelligenceAug-15-2025

The rapid advancement of speech generation technology has led to the widespread proliferation of deepfake speech across social media platforms. While deepfake audio countermeasures (CMs) achieve promising results on public datasets, their performance degrades significantly in cross-domain scenarios. To advance CMs for real-world deepfake detection, we first propose the Fake Speech Wild (FSW) dataset, which includes 254 hours of real and deepfake audio from four different media platforms, focusing on social media. As CMs, we establish a benchmark using public datasets and advanced selfsupervised learning (SSL)-based CMs to evaluate current CMs in real-world scenarios. We also assess the effectiveness of data augmentation strategies in enhancing CM robustness for detecting deepfake speech on social media. Finally, by augmenting public datasets and incorporating the FSW training set, we significantly advanced real-world deepfake audio detection performance, achieving an average equal error rate (EER) of 3.54% across all evaluation sets.

artificial intelligence, machine learning, social media, (18 more...)

arXiv.org Artificial Intelligence

2508.10559

Country: Asia > China (0.16)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks

Uddin, Kutub, Khan, Awais, Farooq, Muhammad Umar, Malik, Khalid

arXiv.org Artificial IntelligenceJul-18-2025

Audio plays a crucial role in applications like speaker verification, voice-enabled smart devices, and audio conferencing. However, audio manipulations, such as deepfakes, pose significant risks by enabling the spread of misinformation. Our empirical analysis reveals that existing methods for detecting deepfake audio are often vulnerable to anti-forensic (AF) attacks, particularly those attacked using generative adversarial networks. In this article, we propose a novel collaborative learning method called SHIELD to defend against generative AF attacks. T o expose AF signatures, we integrate an auxiliary generative model, called the defense (DF) generative model, which facilitates collaborative learning by combining input and output. Furthermore, we design a triplet model to capture correlations for real and AF attacked audios with real-generated and attacked-generated audios using auxiliary generative models. The proposed SHIELD strengthens the defense against generative AF attacks and achieves robust performance across various generative models. The proposed AF significantly reduces the average detection accuracy from 95.49% to 59.77% for ASVspoof2019, from 99.44% to 38.45% for In-the-Wild, and from 98.41% to 51.18% for HalfTruth for three different generative models. The proposed SHIELD mechanism is robust against AF attacks and achieves an average accuracy of 98.13%, 98.58%, and 99.57% in match, and 98.78%, 98.62%, and 98.85% in mismatch settings for the ASVspoof2019, In-the-Wild, and HalfTruth datasets, respectively.

detection, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2507.1317

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reject Threshold Adaptation for Open-Set Model Attribution of Deepfake Audio

Yan, Xinrui, Yi, Jiangyan, Tao, Jianhua, Chen, Yujie, Gu, Hao, Li, Guanjun, Zhou, Junzuo, Ren, Yong, Xu, Tao

arXiv.org Artificial IntelligenceDec-2-2024

Open environment oriented open set model attribution of deepfake audio is an emerging research topic, aiming to identify the generation models of deepfake audio. Most previous work requires manually setting a rejection threshold for unknown classes to compare with predicted probabilities. However, models often overfit training instances and generate overly confident predictions. Moreover, thresholds that effectively distinguish unknown categories in the current dataset may not be suitable for identifying known and unknown categories in another data distribution. To address the issues, we propose a novel framework for open set model attribution of deepfake audio with rejection threshold adaptation (ReTA). Specifically, the reconstruction error learning module trains by combining the representation of system fingerprints with labels corresponding to either the target class or a randomly chosen other class label. This process generates matching and non-matching reconstructed samples, establishing the reconstruction error distributions for each class and laying the foundation for the reject threshold calculation module. The reject threshold calculation module utilizes gaussian probability estimation to fit the distributions of matching and non-matching reconstruction errors. It then computes adaptive reject thresholds for all classes through probability minimization criteria. The experimental results demonstrate the effectiveness of ReTA in improving the open set model attributes of deepfake audio.

artificial intelligence, machine learning, recognition, (16 more...)

arXiv.org Artificial Intelligence

2412.01425

Country: Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report (0.84)
Instructional Material > Course Syllabus & Notes (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Quantum-Trained Convolutional Neural Network for Deepfake Audio Detection

Lin, Chu-Hsuan Abraham, Liu, Chen-Yu, Chen, Samuel Yen-Chi, Chen, Kuan-Cheng

arXiv.org Artificial IntelligenceOct-11-2024

The rise of deepfake technologies has posed significant challenges to privacy, security, and information integrity, particularly in audio and multimedia content. This paper introduces a Quantum-Trained Convolutional Neural Network (QT-CNN) framework designed to enhance the detection of deepfake audio, leveraging the computational power of quantum machine learning (QML). The QT-CNN employs a hybrid quantum-classical approach, integrating Quantum Neural Networks (QNNs) with classical neural architectures to optimize training efficiency while reducing the number of trainable parameters. Our method incorporates a novel quantum-to-classical parameter mapping that effectively utilizes quantum states to enhance the expressive power of the model, achieving up to 70% parameter reduction compared to classical models without compromising accuracy. Data pre-processing involved extracting essential audio features, label encoding, feature scaling, and constructing sequential datasets for robust model evaluation. Experimental results demonstrate that the QT-CNN achieves comparable performance to traditional CNNs, maintaining high accuracy during training and testing phases across varying configurations of QNN blocks. The QT framework's ability to reduce computational overhead while maintaining performance underscores its potential for real-world applications in deepfake detection and other resource-constrained scenarios. This work highlights the practical benefits of integrating quantum computing into artificial intelligence, offering a scalable and efficient approach to advancing deepfake detection technologies.

artificial intelligence, arxiv preprint arxiv, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2410.0925

Country:

Europe > United Kingdom > England > Greater London > London (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap

Channing, Georgia, Sock, Juil, Clark, Ronald, Torr, Philip, de Witt, Christian Schroeder

arXiv.org Artificial IntelligenceOct-9-2024

The rapid proliferation of AI-manipulated or generated audio deepfakes poses serious challenges to media integrity and election security. Current AI-driven detection solutions lack explainability and underperform in real-world settings. In this paper, we introduce novel explainability methods for state-of-the-art transformer-based audio deepfake detectors and open-source a novel benchmark for real-world generalizability. By narrowing the explainability gap between transformer-based audio deepfake detectors and traditional methods, our results not only build trust with human experts, but also pave the way for unlocking the potential of citizen intelligence to overcome the scalability issue in audio deepfake detection.

audio sample, dataset, detection, (16 more...)

arXiv.org Artificial Intelligence

2410.07436

Country:

North America > United States (0.14)
North America > Mexico (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?

Xie, Yuankun, Xiong, Chenxu, Wang, Xiaopeng, Wang, Zhiyong, Lu, Yi, Qi, Xin, Fu, Ruibo, Liu, Yukun, Wen, Zhengqi, Tao, Jianhua, Li, Guanjun, Ye, Long

arXiv.org Artificial IntelligenceAug-20-2024

Currently, Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs. These ALMs have significantly lowered the barrier to creating deepfake audio, generating highly realistic and diverse types of deepfake audio, which pose severe threats to society. Consequently, effective audio deepfake detection technologies to detect ALM-based audio have become increasingly critical. This paper investigate the effectiveness of current countermeasure (CM) against ALM-based audio. Specifically, we collect 12 types of the latest ALM-based deepfake audio and utilizing the latest CMs to evaluate. Our findings reveal that the latest codec-trained CM can effectively detect ALM-based audio, achieving 0% equal error rate under most ALM test conditions, which exceeded our expectations. This indicates promising directions for future research in ALM-based deepfake audio detection.

deepfake audio, detection, speech sample, (12 more...)

arXiv.org Artificial Intelligence

2408.10853

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Add feedback

Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio

Lu, Yi, Xie, Yuankun, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Wang, Zhiyong, Qi, Xin, Liu, Xuefei, Li, Yongwei, Liu, Yukun, Wang, Xiaopeng, Shi, Shuchen

arXiv.org Artificial IntelligenceJun-12-2024

With the proliferation of Large Language Model (LLM) based deepfake audio, there is an urgent need for effective detection methods. Previous deepfake audio generation methods typically involve a multi-step generation process, with the final step using a vocoder to predict the waveform from handcrafted features. However, LLM-based audio is directly generated from discrete neural codecs in an end-to-end generation process, skipping the final step of vocoder processing. This poses a significant challenge for current audio deepfake detection (ADD) models based on vocoder artifacts. To effectively detect LLM-based deepfake audio, we focus on the core of the generation process, the conversion from neural codec to waveform. We propose Codecfake dataset, which is generated by seven representative neural codec methods. Experiment results show that codec-trained ADD models exhibit a 41.406% reduction in average equal error rate compared to vocoder-trained ADD models on the Codecfake test set.

audio, dataset, language model, (12 more...)

arXiv.org Artificial Intelligence

2406.08112

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Add feedback

The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio

Xie, Yuankun, Lu, Yi, Fu, Ruibo, Wen, Zhengqi, Wang, Zhiyong, Tao, Jianhua, Qi, Xin, Wang, Xiaopeng, Liu, Yukun, Cheng, Haonan, Ye, Long, Sun, Yi

arXiv.org Artificial IntelligenceMay-15-2024

With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for generalized detection methods. ALM-based deepfake audio currently exhibits widespread, high deception, and type versatility, posing a significant challenge to current audio deepfake detection (ADD) models trained solely on vocoded data. To effectively detect ALM-based deepfake audio, we focus on the mechanism of the ALM-based audio generation method, the conversion from neural codec to waveform. We initially construct the Codecfake dataset, an open-source large-scale dataset, including 2 languages, over 1M audio samples, and various test conditions, focus on ALM-based audio detection. As countermeasure, to achieve universal detection of deepfake audio and tackle domain ascent bias issue of original SAM, we propose the CSAM strategy to learn a domain balanced and generalized minima. In our experiments, we first demonstrate that ADD model training with the Codecfake dataset can effectively detects ALM-based audio. Furthermore, our proposed generalization countermeasure yields the lowest average Equal Error Rate (EER) of 0.616% across all test conditions compared to baseline models. The dataset and associated code are available online.

arxiv preprint arxiv, audio, dataset, (14 more...)

arXiv.org Artificial Intelligence

2405.0488

Country:

Asia > China > Beijing > Beijing (0.05)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > Israel (0.04)
Asia > China > Guangdong Province > Zhuhai (0.04)

Genre: Research Report (0.70)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Deepfake audio of Biden alarms experts in lead-up to U.S. elections

The Japan TimesJan-23-2024, 07:10:00 GMT

No political deepfake has alarmed the world's disinformation experts more than the doctored audio message of U.S. President Joe Biden that began circulating over the weekend. In the phone message, a voice edited to sound like Biden urged voters in New Hampshire not to cast their ballots in Tuesday's Democratic primary. "Save your vote for the November election," the phone message went.

artificial intelligence, biden alarm expert, machine learning, (2 more...)

The Japan Times

Country: North America > United States > New Hampshire (0.39)

Industry:

Government > Voting & Elections (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.75)

Add feedback

System Fingerprint Recognition for Deepfake Audio: An Initial Dataset and Investigation

Yan, Xinrui, Yi, Jiangyan, Wang, Chenglong, Tao, Jianhua, Zhou, Junzuo, Gu, Hao, Fu, Ruibo

arXiv.org Artificial IntelligenceSep-15-2023

The rapid progress of deep speech synthesis models has posed significant threats to society such as malicious content manipulation. Therefore, many studies have emerged to detect the so-called deepfake audio. However, existing works focus on the binary detection of real audio and fake audio. In real-world scenarios such as model copyright protection and digital evidence forensics, it is needed to know what tool or model generated the deepfake audio to explain the decision. This motivates us to ask: Can we recognize the system fingerprints of deepfake audio? In this paper, we present the first deepfake audio dataset for system fingerprint recognition (SFR) and conduct an initial investigation. We collected the dataset from the speech synthesis systems of seven Chinese vendors that use the latest state-of-the-art deep learning technologies, including both clean and compressed sets. In addition, to facilitate the further development of system fingerprint recognition methods, we provide extensive benchmarks that can be compared and research findings. The dataset will be publicly available. .

dataset, fingerprint recognition, recognition, (13 more...)

arXiv.org Artificial Intelligence

2208.10489

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Zhejiang Province (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Fingerprint Recognition (0.83)

Add feedback