Goto

Collaborating Authors

 Sholokhov, Alexey


Probabilistic Back-ends for Online Speaker Recognition and Clustering

arXiv.org Artificial Intelligence

This paper focuses on multi-enrollment speaker recognition which naturally occurs in the task of online speaker clustering, and studies the properties of different scoring back-ends in this scenario. First, we show that popular cosine scoring suffers from poor score calibration with a varying number of enrollment utterances. Second, we propose a simple replacement for cosine scoring based on an extremely constrained version of probabilistic linear discriminant analysis (PLDA). The proposed model improves over the cosine scoring for multi-enrollment recognition while keeping the same performance in the case of one-to-one comparisons. Finally, we consider an online speaker clustering task where each step naturally involves multi-enrollment recognition. We propose an online clustering algorithm allowing us to take benefits from the PLDA model such as the ability to handle uncertainty and better score calibration. Our experiments demonstrate the effectiveness of the proposed algorithm.


Extrapolating false alarm rates in automatic speaker verification

arXiv.org Machine Learning

Automatic speaker verification (ASV) vendors and corpus In this study we improve upon the generative model presented providers would both benefit from tools to reliably extrapolate in [3]. Despite demonstrating expected overall trends, performance metrics for large speaker populations without collecting the predicted false alarm rates were substantially overestimated, new speakers. We address false alarm rate extrapolation particularly at high ASV thresholds (proxies of high-security under a worst-case model whereby an adversary identifies the applications). To tackle this shortcoming, we propose a discriminative closest impostor for a given target speaker from a large population.