Goto

Collaborating Authors

 Todisco, Massimiliano


The VoicePrivacy 2024 Challenge Evaluation Plan

arXiv.org Artificial Intelligence

The task of the challenge is to develop a voice anonymization system for speech data which conceals the speaker's voice identity while protecting linguistic content and emotional states. The organizers provide development and evaluation datasets and evaluation scripts, as well as baseline anonymization systems and a list of training resources formed on the basis of the participants' requests. Participants apply their developed anonymization systems, run evaluation scripts and submit evaluation results and anonymized speech data to the organizers. Results will be presented at a workshop held in conjunction with Interspeech 2024 to which all participants are invited to present their challenge systems and to submit additional workshop papers.


Vocoder drift compensation by x-vector alignment in speaker anonymisation

arXiv.org Artificial Intelligence

For the most popular x-vector-based approaches to speaker anonymisation, the bulk of the anonymisation can stem from vocoding rather than from the core anonymisation function which is used to substitute an original speaker x-vector with that of a fictitious pseudo-speaker. This phenomenon can impede the design of better anonymisation systems since there is a lack of fine-grained control over the x-vector space. The work reported in this paper explores the origin of so-called vocoder drift and shows that it is due to the mismatch between the substituted x-vector and the original representations of the linguistic content, intonation and prosody. Also reported is an original approach to vocoder drift compensation. While anonymisation performance degrades as expected, compensation reduces vocoder drift substantially, offers improved control over the x-vector space and lays a foundation for the design of better anonymisation functions in the future.


Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems

arXiv.org Artificial Intelligence

In this paper, we propose Malafide, a novel attack designed for the strength-testing of spoofing and deepfake detection solutions We present Malafide, a universal adversarial attack against automatic against adversarial, convolutive noise attacks. Malafide speaker verification (ASV) spoofing countermeasures attacks involve the optimisation of a linear time-invariant filter (CMs). By introducing convolutional noise using an optimised which is applied to deepfake or spoofed utterances in order to linear time-invariant filter, Malafide attacks can be used to compromise provoke their misclassification as bona fide utterances. Convolutive CM reliability while preserving other speech attributes noise is independent of an utterance and its duration and such as quality and the speaker's voice. In contrast to other is naturally robust to time-domain shifts, unlike additive adversarial adversarial attacks proposed recently, Malafide filters are optimised noise. Malafide attacks require the optimisation of only a independently of the input utterance and duration, are small number of filter coefficients, far less than the number of tuned instead to the underlying spoofing attack, and require the waveform samples or, equivalently, the number of samples that optimisation of only a small number of filter coefficients. Even would need to be generated in an additive noise attack. They so, they degrade CM performance estimates by an order of magnitude, are also more universal than additive noise attacks in that they even in black-box settings, and can also be configured are not optimised for each utterance or speaker, but are instead to overcome integrated CM and ASV subsystems. Integrated optimised for a given spoofing attack, with the latter acting to solutions that use self-supervised learning CMs, however, are ensure they compromise both CM and ASV subsystems.


Towards single integrated spoofing-aware speaker verification embeddings

arXiv.org Artificial Intelligence

This study aims to develop a single integrated spoofing-aware speaker verification (SASV) embeddings that satisfy two aspects. First, rejecting non-target speakers' input as well as target speakers' spoofed inputs should be addressed. Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outperformed single embedding solutions by a large margin in the SASV2022 challenge. We analyze that the inferior performance of single SASV embeddings comes from insufficient amount of training data and distinct nature of ASV and CM tasks. To this end, we propose a novel framework that includes multi-stage training and a combination of loss functions. Copy synthesis, combined with several vocoders, is also exploited to address the lack of spoofed data. Experimental results show dramatic improvements, achieving a SASV-EER of 1.06% on the evaluation protocol of the SASV2022 challenge.


The VoicePrivacy 2020 Challenge: Results and findings

arXiv.org Artificial Intelligence

This paper presents the results and analyses stemming from the first VoicePrivacy 2020 Challenge which focuses on developing anonymization solutions for speech technology. We provide a systematic overview of the challenge design with an analysis of submitted systems and evaluation results. In particular, we describe the voice anonymization task and datasets used for system development and evaluation. Also, we present different attack models and the associated objective and subjective evaluation metrics. We introduce two anonymization baselines and provide a summary description of the anonymization systems developed by the challenge participants. We report objective and subjective evaluation results for baseline and submitted systems. In addition, we present experimental results for alternative privacy metrics and attack models developed as a part of the post-evaluation analysis. Finally, we summarize our insights and observations that will influence the design of the next VoicePrivacy challenge edition and some directions for future voice anonymization research.


Speaker Anonymization Using X-vector and Neural Waveform Models

arXiv.org Machine Learning

The social media revolution has produced a plethora of web services to which users can easily upload and share multimedia documents. Despite the popularity and convenience of such services, the sharing of such inherently personal data, including speech data, raises obvious security and privacy concerns. In particular, a user's speech data may be acquired and used with speech synthesis systems to produce high-quality speech utterances which reflect the same user's speaker identity. These utterances may then be used to attack speaker verification systems. One solution to mitigate these concerns involves the concealing of speaker identities before the sharing of speech data. For this purpose, we present a new approach to speaker anonymization. The idea is to extract linguistic and speaker identity features from an utterance and then to use these with neural acoustic and waveform models to synthesize anonymized speech. The original speaker identity, in the form of timbre, is suppressed and replaced with that of an anonymous pseudo identity. The approach exploits state-of-the-art x-vector speaker representations. These are used to derive anonymized pseudo speaker identities through the combination of multiple, random speaker x-vectors. Experimental results show that the proposed approach is effective in concealing speaker identities. It increases the equal error rate of a speaker verification system while maintaining high quality, anonymized speech.


t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification

arXiv.org Machine Learning

The ASVspoof challenge series was born to spearhead research in anti-spoofing for automatic speaker verification (ASV). The two challenge editions in 2015 and 2017 involved the assessment of spoofing countermeasures (CMs) in isolation from ASV using an equal error rate (EER) metric. While a strategic approach to assessment at the time, it has certain shortcomings. First, the CM EER is not necessarily a reliable predictor of performance when ASV and CMs are combined. Second, the EER operating point is ill-suited to user authentication applications, e.g. telephone banking, characterised by a high target user prior but a low spoofing attack prior. We aim to migrate from CM- to ASV-centric assessment with the aid of a new tandem detection cost function (t-DCF) metric. It extends the conventional DCF used in ASV research to scenarios involving spoofing attacks. The t-DCF metric has 6 parameters: (i) false alarm and miss costs for both systems, and (ii) prior probabilities of target and spoof trials (with an implied third, nontarget prior). The study is intended to serve as a self-contained, tutorial-like presentation. We analyse with the t-DCF a selection of top-performing CM submissions to the 2015 and 2017 editions of ASVspoof, with a focus on the spoofing attack prior. Whereas there is little to choose between countermeasure systems for lower priors, system rankings derived with the EER and t-DCF show differences for higher priors. We observe some ranking changes. Findings support the adoption of the DCF-based metric into the roadmap for future ASVspoof challenges, and possibly for other biometric anti-spoofing evaluations.