fake audio
I Can Hear You: Selective Robust Training for Deepfake Audio Detection
Zhang, Zirui, Hao, Wei, Sankoh, Aroon, Lin, William, Mendiola-Ortiz, Emanuel, Yang, Junfeng, Mao, Chengzhi
Recent advances in AI-generated voices have intensified the challenge of detecting deepfake audio, posing risks for scams and the spread of disinformation. To tackle this issue, we establish the largest public voice dataset to date, named DeepFakeVox-HQ, comprising 1.3 million samples, including 270,000 high-quality deepfake samples from 14 diverse sources. Despite previously reported high accuracy, existing deepfake voice detectors struggle with our diversely collected dataset, and their detection success rates drop even further under realistic corruptions and adversarial attacks. We conduct a holistic investigation into factors that enhance model robustness and show that incorporating a diversified set of voice augmentations is beneficial. Moreover, we find that the best detection models often rely on high-frequency features, which are imperceptible to humans and can be easily manipulated by an attacker. To address this, we propose the F-SAT: Frequency-Selective Adversarial Training method focusing on high-frequency components. Empirical results demonstrate that using our training dataset boosts baseline model performance (without robust training) by 33%, and our robust training further improves accuracy by 7.7% on clean samples and by 29.3% on corrupted and attacked samples, over the state-of-the-art RawNet3 model.
- North America > United States > Pennsylvania (0.04)
- North America > United States > New York (0.04)
- Asia > China > Hong Kong (0.04)
Human Brain Exhibits Distinct Patterns When Listening to Fake Versus Real Audio: Preliminary Evidence
Salehi, Mahsa, Stefanov, Kalin, Shareghi, Ehsan
In this paper we study the variations in human brain activity when listening to real and fake audio. Our preliminary results suggest that the representations learned by a state-of-the-art deepfake audio detection algorithm, do not exhibit clear distinct patterns between real and fake audio. In contrast, human brain activity, as measured by EEG, displays distinct patterns when individuals are exposed to fake versus real audio. This preliminary evidence enables future research directions in areas such as deepfake audio detection.
- Oceania > Australia > Victoria > Melbourne (0.05)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Europe > United Kingdom (0.04)
- Asia > Middle East > UAE (0.04)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area > Neurology (0.70)
An Alleged Deepfake of UK Opposition Leader Keir Starmer Shows the Dangers of Fake Audio
As members of the UK's largest opposition party gathered in Liverpool for their party conference--probably their last before the UK holds a general election--a potentially explosive audio file started circulating on X, formerly known as Twitter. The 25-second recording was posted by an X account with the handle "@Leo_Hutz" that was set up in January 2023. In the clip, Sir Keir Starmer, the Labour Party leader, is apparently heard swearing repeatedly at a staffer. "I have obtained audio of Keir Starmer verbally abusing his staffers at [the Labour Party] conference," the X account posted. "This disgusting bully is about to become our next PM."
- Government (1.00)
- Information Technology > Security & Privacy (0.48)
Adaptive Fake Audio Detection with Low-Rank Model Squeezing
Zhang, Xiaohui, Yi, Jiangyan, Tao, Jianhua, Wang, Chenlong, Xu, Le, Fu, Ruibo
Traditional approaches, such as finetuning on new datasets containing these novel spoofing algorithms, are computationally intensive and pose a risk of impairing the acquired knowledge of known fake audio types. To address these challenges, this paper proposes an innovative approach that mitigates the limitations associated with finetuning. We introduce the concept of training low-rank adaptation matrices tailored specifically to the newly emerging fake audio types. During the inference stage, these adaptation matrices are combined with the existing model to generate the final prediction output. Extensive experimentation is conducted to evaluate the efficacy of the proposed method. The results demonstrate that our approach effectively preserves the prediction accuracy of the existing model for known fake audio types. Furthermore, our approach offers several advantages, including reduced storage memory requirements and lower equal error rates compared to conventional finetuning methods, particularly on specific spoofing algorithms.
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- Asia > China > Beijing > Beijing (0.05)
- Europe > Czechia > South Moravian Region > Brno (0.04)
- (11 more...)
Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion
Liu, Rui, Zhang, Jinhua, Gao, Guanglai, Li, Haizhou
Audio Deepfake Detection (ADD) aims to detect the fake audio generated by text-to-speech (TTS), voice conversion (VC) and replay, etc., which is an emerging topic. Traditionally we take the mono signal as input and focus on robust feature extraction and effective classifier design. However, the dual-channel stereo information in the audio signal also includes important cues for deepfake, which has not been studied in the prior work. In this paper, we propose a novel ADD model, termed as M2S-ADD, that attempts to discover audio authenticity cues during the mono-to-stereo conversion process. We first projects the mono to a stereo signal using a pretrained stereo synthesizer, then employs a dual-branch neural architecture to process the left and right channel signals, respectively. In this way, we effectively reveal the artifacts in the fake audio, thus improve the ADD performance. The experiments on the ASVspoof2019 database show that M2S-ADD outperforms all baselines that input mono. We release the source code at \url{https://github.com/AI-S2-Lab/M2S-ADD}.
- Asia > Mongolia (0.04)
- Asia > China > Inner Mongolia (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- (2 more...)
I Cloned My Voice and My Mother Couldn't Tell the Difference
This article is from Understanding AI, a newsletter that explores how A.I. works and how it's changing our world. A couple of weeks ago, I used A.I. software to clone my voice. The resulting audio sounded pretty convincing to me, but I wanted to see what others thought. So I created a test audio file based on the first 12 paragraphs of this article that I wrote. Seven randomly chosen paragraphs were my real voice, while the other five were generated by A.I. I asked members of my family to see if they could tell the difference.
- North America > United States (0.14)
- Europe > Ireland (0.05)
- North America > Canada > Saskatchewan > Regina (0.04)
- Information Technology > Security & Privacy (1.00)
- Media (0.95)
- Government (0.95)
An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio
Yan, Xinrui, Yi, Jiangyan, Tao, Jianhua, Wang, Chenglong, Ma, Haoxin, Wang, Tao, Wang, Shiming, Fu, Ruibo
Many effective attempts have been made for fake audio detection. However, they can only provide detection results but no countermeasures to curb this harm. For many related practical applications, what model or algorithm generated the fake audio also is needed. Therefore, We propose a new problem for detecting vocoder fingerprints of fake audio. Experiments are conducted on the datasets synthesized by eight state-of-the-art vocoders. We have preliminarily explored the features and model architectures. The t-SNE visualization shows that different vocoders generate distinct vocoder fingerprints.
- Europe > Portugal > Lisbon > Lisbon (0.05)
- Asia > China > Beijing > Beijing (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Information Technology > Security & Privacy (0.93)
- Media (0.68)
Half-Truth: A Partially Fake Audio Detection Dataset
Yi, Jiangyan, Bai, Ye, Tao, Jianhua, Tian, Zhengkun, Wang, Chenglong, Wang, Tao, Fu, Ruibo
Diverse promising datasets have been designed to hold back the development of fake audio detection, such as ASVspoof databases. However, previous datasets ignore an attacking situation, in which the hacker hides some small fake clips in real speech audio. This poses a serious threat since that it is difficult to distinguish the small fake clip from the whole speech utterance. Therefore, this paper develops such a dataset for half-truth audio detection (HAD). Partially fake audio in the HAD dataset involves only changing a few words in an utterance.The audio of the words is generated with the very latest state-of-the-art speech synthesis technology. We can not only detect fake uttrances but also localize manipulated regions in a speech using this dataset. Some benchmark results are presented on this dataset. The results show that partially fake audio presents much more challenging than fully fake audio for fake audio detection.
It's not just phishing emails, now we have to worry about fake calls, too
When your boss calls and tells you to wire $100,000 to a supplier, be on your toes. It could be a fake call. As if "phishing" phony emails weren't enough, on the rise now are "deep fake" audios that can be cloned with near perfection to sound almost perfect, and are easy to create for hackers. "It's on the rise, and something to watch out for," says Vijay Balasubramaniyan, the CEO of Pindrop, a company that offers biometric authentication for enterprise. Balasubramaniyan demonstrated during a security conference how easy it is to take audio from the internet and use machine learning to create recorded phrases into sentences that the human probably never said.
- North America > United States > California > San Francisco County > San Francisco (0.06)
- Europe > United Kingdom (0.06)
- Asia > North Korea (0.06)