Goto

Collaborating Authors

 fake audio


I Can Hear You: Selective Robust Training for Deepfake Audio Detection

Zhang, Zirui, Hao, Wei, Sankoh, Aroon, Lin, William, Mendiola-Ortiz, Emanuel, Yang, Junfeng, Mao, Chengzhi

arXiv.org Artificial Intelligence

Recent advances in AI-generated voices have intensified the challenge of detecting deepfake audio, posing risks for scams and the spread of disinformation. To tackle this issue, we establish the largest public voice dataset to date, named DeepFakeVox-HQ, comprising 1.3 million samples, including 270,000 high-quality deepfake samples from 14 diverse sources. Despite previously reported high accuracy, existing deepfake voice detectors struggle with our diversely collected dataset, and their detection success rates drop even further under realistic corruptions and adversarial attacks. We conduct a holistic investigation into factors that enhance model robustness and show that incorporating a diversified set of voice augmentations is beneficial. Moreover, we find that the best detection models often rely on high-frequency features, which are imperceptible to humans and can be easily manipulated by an attacker. To address this, we propose the F-SAT: Frequency-Selective Adversarial Training method focusing on high-frequency components. Empirical results demonstrate that using our training dataset boosts baseline model performance (without robust training) by 33%, and our robust training further improves accuracy by 7.7% on clean samples and by 29.3% on corrupted and attacked samples, over the state-of-the-art RawNet3 model.


Human Brain Exhibits Distinct Patterns When Listening to Fake Versus Real Audio: Preliminary Evidence

Salehi, Mahsa, Stefanov, Kalin, Shareghi, Ehsan

arXiv.org Artificial Intelligence

In this paper we study the variations in human brain activity when listening to real and fake audio. Our preliminary results suggest that the representations learned by a state-of-the-art deepfake audio detection algorithm, do not exhibit clear distinct patterns between real and fake audio. In contrast, human brain activity, as measured by EEG, displays distinct patterns when individuals are exposed to fake versus real audio. This preliminary evidence enables future research directions in areas such as deepfake audio detection.


An Alleged Deepfake of UK Opposition Leader Keir Starmer Shows the Dangers of Fake Audio

WIRED

As members of the UK's largest opposition party gathered in Liverpool for their party conference--probably their last before the UK holds a general election--a potentially explosive audio file started circulating on X, formerly known as Twitter. The 25-second recording was posted by an X account with the handle "@Leo_Hutz" that was set up in January 2023. In the clip, Sir Keir Starmer, the Labour Party leader, is apparently heard swearing repeatedly at a staffer. "I have obtained audio of Keir Starmer verbally abusing his staffers at [the Labour Party] conference," the X account posted. "This disgusting bully is about to become our next PM."


Adaptive Fake Audio Detection with Low-Rank Model Squeezing

Zhang, Xiaohui, Yi, Jiangyan, Tao, Jianhua, Wang, Chenlong, Xu, Le, Fu, Ruibo

arXiv.org Artificial Intelligence

Traditional approaches, such as finetuning on new datasets containing these novel spoofing algorithms, are computationally intensive and pose a risk of impairing the acquired knowledge of known fake audio types. To address these challenges, this paper proposes an innovative approach that mitigates the limitations associated with finetuning. We introduce the concept of training low-rank adaptation matrices tailored specifically to the newly emerging fake audio types. During the inference stage, these adaptation matrices are combined with the existing model to generate the final prediction output. Extensive experimentation is conducted to evaluate the efficacy of the proposed method. The results demonstrate that our approach effectively preserves the prediction accuracy of the existing model for known fake audio types. Furthermore, our approach offers several advantages, including reduced storage memory requirements and lower equal error rates compared to conventional finetuning methods, particularly on specific spoofing algorithms.


Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion

Liu, Rui, Zhang, Jinhua, Gao, Guanglai, Li, Haizhou

arXiv.org Artificial Intelligence

Audio Deepfake Detection (ADD) aims to detect the fake audio generated by text-to-speech (TTS), voice conversion (VC) and replay, etc., which is an emerging topic. Traditionally we take the mono signal as input and focus on robust feature extraction and effective classifier design. However, the dual-channel stereo information in the audio signal also includes important cues for deepfake, which has not been studied in the prior work. In this paper, we propose a novel ADD model, termed as M2S-ADD, that attempts to discover audio authenticity cues during the mono-to-stereo conversion process. We first projects the mono to a stereo signal using a pretrained stereo synthesizer, then employs a dual-branch neural architecture to process the left and right channel signals, respectively. In this way, we effectively reveal the artifacts in the fake audio, thus improve the ADD performance. The experiments on the ASVspoof2019 database show that M2S-ADD outperforms all baselines that input mono. We release the source code at \url{https://github.com/AI-S2-Lab/M2S-ADD}.


I Cloned My Voice and My Mother Couldn't Tell the Difference

Slate

This article is from Understanding AI, a newsletter that explores how A.I. works and how it's changing our world. A couple of weeks ago, I used A.I. software to clone my voice. The resulting audio sounded pretty convincing to me, but I wanted to see what others thought. So I created a test audio file based on the first 12 paragraphs of this article that I wrote. Seven randomly chosen paragraphs were my real voice, while the other five were generated by A.I. I asked members of my family to see if they could tell the difference.


An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio

Yan, Xinrui, Yi, Jiangyan, Tao, Jianhua, Wang, Chenglong, Ma, Haoxin, Wang, Tao, Wang, Shiming, Fu, Ruibo

arXiv.org Artificial Intelligence

Many effective attempts have been made for fake audio detection. However, they can only provide detection results but no countermeasures to curb this harm. For many related practical applications, what model or algorithm generated the fake audio also is needed. Therefore, We propose a new problem for detecting vocoder fingerprints of fake audio. Experiments are conducted on the datasets synthesized by eight state-of-the-art vocoders. We have preliminarily explored the features and model architectures. The t-SNE visualization shows that different vocoders generate distinct vocoder fingerprints.


Half-Truth: A Partially Fake Audio Detection Dataset

Yi, Jiangyan, Bai, Ye, Tao, Jianhua, Tian, Zhengkun, Wang, Chenglong, Wang, Tao, Fu, Ruibo

arXiv.org Artificial Intelligence

Diverse promising datasets have been designed to hold back the development of fake audio detection, such as ASVspoof databases. However, previous datasets ignore an attacking situation, in which the hacker hides some small fake clips in real speech audio. This poses a serious threat since that it is difficult to distinguish the small fake clip from the whole speech utterance. Therefore, this paper develops such a dataset for half-truth audio detection (HAD). Partially fake audio in the HAD dataset involves only changing a few words in an utterance.The audio of the words is generated with the very latest state-of-the-art speech synthesis technology. We can not only detect fake uttrances but also localize manipulated regions in a speech using this dataset. Some benchmark results are presented on this dataset. The results show that partially fake audio presents much more challenging than fully fake audio for fake audio detection.


It's not just phishing emails, now we have to worry about fake calls, too

USATODAY - Tech Top Stories

When your boss calls and tells you to wire $100,000 to a supplier, be on your toes. It could be a fake call. As if "phishing" phony emails weren't enough, on the rise now are "deep fake" audios that can be cloned with near perfection to sound almost perfect, and are easy to create for hackers. "It's on the rise, and something to watch out for," says Vijay Balasubramaniyan, the CEO of Pindrop, a company that offers biometric authentication for enterprise. Balasubramaniyan demonstrated during a security conference how easy it is to take audio from the internet and use machine learning to create recorded phrases into sentences that the human probably never said.