Human Perception of Audio Deepfakes

Müller, Nicolas M., Markert, Karla, Böttinger, Konstantin

arXiv.org Artificial Intelligence 

The recent emergence of deepfakes, computerized realistic multimedia fakes, brought the detection of manipulated and generated content to the forefront. While many machine learning models for deepfakes detection have been proposed, the human detection capabilities have remained far less explored. This is of special importance as human perception differs from machine perception and deepfakes are generally designed to fool the human. So far, this issue has only been addressed in the area of images and video. To compare the ability of humans and machines in detecting audio deepfakes, we conducted an online gamified experiment in which we asked users to discern bonda-fide audio samples from spoofed audio, generated with a variety of algorithms. 200 users competed for 8976 game rounds with an artificial intelligence (AI) algorithm trained for audio deepfake detection. With the collected data we found that the machine generally outperforms the humans in detecting audio deepfakes, but that the converse holds for a certain attack type, for which humans are still more accurate. Furthermore, we found that younger participants are on average better at detecting audio deepfakes than older participants, while IT-professionals hold no advantage over laymen. We conclude that it is important to combine human and machine knowledge in order to improve audio deepfake detection.