nherr
Towards generalizing deep-audio fake detection networks
Gasenzer, Konstantin, Wolter, Moritz
Today's generative neural networks allow the creation of high-quality synthetic speech at scale. While we welcome the creative use of this new technology, we must also recognize the risks. As synthetic speech is abused for monetary and identity theft, we require a broad set of deepfake identification tools. Furthermore, previous work reported a limited ability of deep classifiers to generalize to unseen audio generators. We study the frequency domain fingerprints of current audio generators. Building on top of the discovered frequency footprints, we train excellent lightweight detectors that generalize. We report improved results on the WaveFake dataset and an extended version. To account for the rapid progress in the field, we extend the WaveFake dataset by additionally considering samples drawn from the novel Avocodo and BigVGAN networks.
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (8 more...)
Alphagalileo > Item Display
Machines can use artificial intelligence to create photos or voice recordings that look or sound like those in real life. Researchers at the Horst Görtz Institute for IT Security at Ruhr-Universität Bochum are interested in how such artificially generated data, known as deepfakes, can be distinguished from real data. They found that real and fake voice recordings differ in the high frequencies. To date, deepfakes had mainly been analysed in image files. The new findings should help to recognise fake language recordings in the future.