AITopics | speaker and language recognition workshop

Collaborating Authors

speaker and language recognition workshop

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fast variational Bayes for heavy-tailed PLDA applied to i-vectors and x-vectors

Silnova, Anna, Brummer, Niko, Garcia-Romero, Daniel, Snyder, David, Burget, Lukas

arXiv.org Machine LearningMar-24-2018

The standard state-of-the-art backend for text-independent speaker recognizers that use i-vectors or x-vectors, is Gaussian PLDA (G-PLDA), assisted by a Gaussianization step involving length normalization. G-PLDA can be trained with both generative or discriminative methods. It has long been known that heavy-tailed PLDA (HT-PLDA), applied without length normalization, gives similar accuracy, but at considerable extra computational cost. We have recently introduced a fast scoring algorithm for a discriminatively trained HT-PLDA backend. This paper extends that work by introducing a fast, variational Bayes, generative training algorithm. We compare old and new backends, with and without length-normalization, with i-vectors and x-vectors, on SRE'10, SRE'16 and SITW.

artificial intelligence, ht-plda, machine learning, (16 more...)

arXiv.org Machine Learning

1803.09153

Country: Europe > Czechia (0.15)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Gaussian meta-embeddings for efficient scoring of a heavy-tailed PLDA model

Brummer, Niko, Silnova, Anna, Burget, Lukas, Stafylakis, Themos

arXiv.org Machine LearningFeb-27-2018

Embeddings in machine learning are low-dimensional representations of complex input patterns, with the property that simple geometric operations like Euclidean distances and dot products can be used for classification and comparison tasks. The proposed meta-embeddings are special embeddings that live in more general inner product spaces. They are designed to propagate uncertainty to the final output in speaker recognition and similar applications. The familiar Gaussian PLDA model (GPLDA) can be re-formulated as an extractor for Gaussian meta-embeddings (GMEs), such that likelihood ratio scores are given by Hilbert space inner products between Gaussian likelihood functions. GMEs extracted by the GPLDA model have fixed precisions and do not propagate uncertainty. We show that a generalization to heavy-tailed PLDA gives GMEs with variable precisions, which do propagate uncertainty. Experiments on NIST SRE 2010 and 2016 show that the proposed method applied to i-vectors without length normalization is up to 20% more accurate than GPLDA applied to length-normalized ivectors.

artificial intelligence, machine learning, pattern recognition, (15 more...)

arXiv.org Machine Learning

1802.09777

Country:

Europe > Czechia (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

A Generative Model for Score Normalization in Speaker Recognition

Swart, Albert, Brummer, Niko

arXiv.org Machine LearningSep-28-2017

We propose a theoretical framework for thinking about score normalization, which confirms that normalization is not needed under (admittedly fragile) ideal conditions. If, however, these conditions are not met, e.g. under data-set shift between training and runtime, our theory reveals dependencies between scores that could be exploited by strategies such as score normalization. Indeed, it has been demonstrated over and over experimentally, that various ad-hoc score normalization recipes do work. We present a first attempt at using probability theory to design a generative score-space normalization model which gives similar improvements to ZT-norm on the text-dependent RSR 2015 database.

machine learning, normalization, pattern recognition, (18 more...)

arXiv.org Machine Learning

1709.09868

Country: Europe > Czechia (0.15)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Speech Recognition (0.43)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.40)

Add feedback

The Intelligent Voice 2016 Speaker Recognition System

Khosravani, Abbas, Glackin, Cornelius, Dugan, Nazim, Chollet, Gérard, Cannings, Nigel

arXiv.org Machine LearningNov-2-2016

We trained on each acoustic feature a full covariance, genderindependent UBM model with 2048 Gaussians followed by a 600-dimensional i-vector extractor to establish our MFCCand PLP-based i-vector systems. The unlabeled set of development data was used in the training of both the UBM and the i-vector extractor. The open-source Kaldi software has been used for all these processing steps [20]. It has been shown that successive acoustic observation vectors tend to be highly correlated. This may be problematic for maximum a posteriori (MAP) estimation of i-vectors. To investigating this issue, scaling the zero and first order Baum-Welch statistics before presenting them to the i-vector extractor has been proposed. It turns out that a scale factor of 0.33 gives a slight edge, resulting in a better decision cost function [10]. This scaling factor has been performed in training the i-vector extractor as well as in the testing.

machine learning, pattern recognition, speaker and language recognition workshop, (12 more...)

arXiv.org Machine Learning

1611.00514

Country: Europe > Finland (0.16)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Speech Recognition (0.47)

Add feedback