Silnova, Anna
Toroidal Probabilistic Spherical Discriminant Analysis
Silnova, Anna, Brümmer, Niko, Swart, Albert, Burget, Lukáš
In speaker recognition, where speech segments are mapped to embeddings on the unit hypersphere, two scoring back-ends are commonly used, namely cosine scoring and PLDA. We have recently proposed PSDA, an analog to PLDA that uses Von Mises-Fisher distributions instead of Gaussians. In this paper, we present toroidal PSDA (T-PSDA). It extends PSDA with the ability to model within and between-speaker variabilities in toroidal submanifolds of the hypersphere. Like PLDA and PSDA, the model allows closed-form scoring and closed-form EM updates for training. On VoxCeleb, we find T-PSDA accuracy on par with cosine scoring, while PLDA accuracy is inferior. On NIST SRE'21 we find that T-PSDA gives large accuracy gains compared to both cosine scoring and PLDA.
Fast variational Bayes for heavy-tailed PLDA applied to i-vectors and x-vectors
Silnova, Anna, Brummer, Niko, Garcia-Romero, Daniel, Snyder, David, Burget, Lukas
The standard state-of-the-art backend for text-independent speaker recognizers that use i-vectors or x-vectors, is Gaussian PLDA (G-PLDA), assisted by a Gaussianization step involving length normalization. G-PLDA can be trained with both generative or discriminative methods. It has long been known that heavy-tailed PLDA (HT-PLDA), applied without length normalization, gives similar accuracy, but at considerable extra computational cost. We have recently introduced a fast scoring algorithm for a discriminatively trained HT-PLDA backend. This paper extends that work by introducing a fast, variational Bayes, generative training algorithm. We compare old and new backends, with and without length-normalization, with i-vectors and x-vectors, on SRE'10, SRE'16 and SITW.
Gaussian meta-embeddings for efficient scoring of a heavy-tailed PLDA model
Brummer, Niko, Silnova, Anna, Burget, Lukas, Stafylakis, Themos
Embeddings in machine learning are low-dimensional representations of complex input patterns, with the property that simple geometric operations like Euclidean distances and dot products can be used for classification and comparison tasks. The proposed meta-embeddings are special embeddings that live in more general inner product spaces. They are designed to propagate uncertainty to the final output in speaker recognition and similar applications. The familiar Gaussian PLDA model (GPLDA) can be re-formulated as an extractor for Gaussian meta-embeddings (GMEs), such that likelihood ratio scores are given by Hilbert space inner products between Gaussian likelihood functions. GMEs extracted by the GPLDA model have fixed precisions and do not propagate uncertainty. We show that a generalization to heavy-tailed PLDA gives GMEs with variable precisions, which do propagate uncertainty. Experiments on NIST SRE 2010 and 2016 show that the proposed method applied to i-vectors without length normalization is up to 20% more accurate than GPLDA applied to length-normalized ivectors.