AITopics

1709.09868

Country: Europe > Czechia (0.15)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Speech Recognition (0.43)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.40)

Khosravani, Abbas, Glackin, Cornelius, Dugan, Nazim, Chollet, Gérard, Cannings, Nigel

The Intelligent Voice 2016 Speaker Recognition System

arXiv.org Machine LearningNov-2-2016

We trained on each acoustic feature a full covariance, genderindependent UBM model with 2048 Gaussians followed by a 600-dimensional i-vector extractor to establish our MFCCand PLP-based i-vector systems. The unlabeled set of development data was used in the training of both the UBM and the i-vector extractor. The open-source Kaldi software has been used for all these processing steps [20]. It has been shown that successive acoustic observation vectors tend to be highly correlated. This may be problematic for maximum a posteriori (MAP) estimation of i-vectors. To investigating this issue, scaling the zero and first order Baum-Welch statistics before presenting them to the i-vector extractor has been proposed. It turns out that a scale factor of 0.33 gives a slight edge, resulting in a better decision cost function [10]. This scaling factor has been performed in training the i-vector extractor as well as in the testing.

machine learning, pattern recognition, speaker and language recognition workshop, (12 more...)

1611.00514

Country: Europe > Finland (0.16)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Speech Recognition (0.47)

arXiv.org Machine LearningSep-27-2016

Multi-task Recurrent Model for Speech and Speaker Recognition

Tang, Zhiyuan, Li, Lantian, Wang, Dong

Abstract--Although highly correlated, speech and speaker recognition have been regarded as two independent tasks and studied by two communities. This is certainly not the way that people behave: we decipher both speech content and speaker traits at the same time. This paper presents a unified model to perform speech and speaker recognition simultaneously and altogether . The model is based on a unified neural network where the output of one task is fed to the input of the other, leading to a multi-task recurrent network. Experiments show that the joint model outperforms the task-specific models on both the two tasks.

machine learning, pattern recognition, recognition, (20 more...)

1603.09643

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Speech Recognition (0.85)

Sadjadi, Seyed Omid, Pelecanos, Jason, Ganapathy, Sriram

The IBM Speaker Recognition System: Recent Advances and Error Analysis

arXiv.org Machine LearningMay-5-2016

We present the recent advances along with an error analysis of the IBM speaker recognition system for conversational speech. Some of the key advancements that contribute to our system include: a nearest-neighbor discriminant analysis (NDA) approach (as opposed to LDA) for intersession variability compensation in the i-vector space, the application of speaker and channel-adapted features derived from an automatic speech recognition (ASR) system for speaker recognition, and the use of a DNN acoustic model with a very large number of output units ( 10k senones) to compute the frame-level soft alignments required in the i-vector estimation process. We evaluate these techniques on the NIST 2010 SRE extended core conditions (C1-C9), as well as the 10sec-10sec condition. To our knowledge, results achieved by our system represent the best performances published to date on these conditions. For example, on the extended tel-tel condition (C5) the system achieves an EER of 0.59%. To garner further understanding of the remaining errors (on C5), we examine the recordings associated with the low scoring target trials, where various issues are identified for the problematic recordings/trials. Interestingly, it is observed that correcting the pathological recordings not only improves the scores for the target trials but also for the nontarget trials.

machine learning, pattern recognition, recognition, (16 more...)

1605.01635

Country:

Europe (1.00)
Asia (0.94)
North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Information Technology (0.62)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Speech Recognition (0.88)

Sadjadi, Seyed Omid, Ganapathy, Sriram, Pelecanos, Jason W.

The IBM 2016 Speaker Recognition System

arXiv.org Machine LearningFeb-23-2016

In this paper we describe the recent advancements made in the IBM i-vector speaker recognition system for conversational speech. In particular, we identify key techniques that contribute to significant improvements in performance of our system, and quantify their contributions. The techniques include: 1) a nearest-neighbor discriminant analysis (NDA) approach that is formulated to alleviate some of the limitations associated with the conventional linear discriminant analysis (LDA) that assumes Gaussian class-conditional distributions, 2) the application of speaker- and channel-adapted features, which are derived from an automatic speech recognition (ASR) system, for speaker recognition, and 3) the use of a deep neural network (DNN) acoustic model with a large number of output units (~10k senones) to compute the frame-level soft alignments required in the i-vector estimation process. We evaluate these techniques on the NIST 2010 speaker recognition evaluation (SRE) extended core conditions involving telephone and microphone trials. Experimental results indicate that: 1) the NDA is more effective (up to 35% relative improvement in terms of EER) than the traditional parametric LDA for speaker recognition, 2) when compared to raw acoustic features (e.g., MFCCs), the ASR speaker-adapted features provide gains in speaker recognition performance, and 3) increasing the number of output units in the DNN acoustic model (i.e., increasing the senone set size from 2k to 10k) provides consistent improvements in performance (for example from 37% to 57% relative EER gains over our baseline GMM i-vector system). To our knowledge, results reported in this paper represent the best performances published to date on the NIST SRE 2010 extended core tasks.

machine learning, pattern recognition, recognition, (18 more...)

1602.07291

Country:

Europe (1.00)
North America > United States (0.46)

Genre: Research Report (0.82)

Industry: Information Technology (0.63)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsDec-31-2002

A Sequence Kernel and its Application to Speaker Recognition

Campbell, William M.

A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces classifier model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error training. Training using standard support vector machine methodology gives accuracy that significantly exceeds the performance of state-of-the-art mean-squared error training for a speaker recognition task.

kernel, polynomial classifier, recognition, (14 more...)

Country:

North America > United States > Arizona > Maricopa County > Tempe (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Speech Recognition (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.60)

Neural Information Processing SystemsDec-31-2002

A Sequence Kernel and its Application to Speaker Recognition

Campbell, William M.

A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces classifier model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error training. Training using standard support vector machine methodology gives accuracy that significantly exceeds the performance of state-of-the-art mean-squared error training for a speaker recognition task.

kernel, polynomial classifier, recognition, (14 more...)

Country:

North America > United States > Arizona > Maricopa County > Tempe (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Speech Recognition (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.60)

Neural Information Processing SystemsDec-31-2002

A Sequence Kernel and its Application to Speaker Recognition

Campbell, William M.

A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion.

artificial intelligence, machine learning, pattern recognition, (16 more...)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Speech Recognition (0.43)

Farrell, Kevin R., Mammone, Richard J.

Speaker Recognition Using Neural Tree Networks

Neural Information Processing SystemsDec-31-1994

A new classifier is presented for text-independent speaker recognition. The new classifier is called the modified neural tree network (MNTN). The NTN is a hierarchical classifier that combines the properties of decision trees and feed-forward neural networks. The MNTN differs from the standard NTN in that a new learning rule based on discriminant learning is used, which minimizes the classification error as opposed to a norm of the approximation error. The MNTN also uses leaf probability measures in addition to the class labels.

classifier, recognition, speaker recognition, (11 more...)

Country:

North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Speech Recognition (0.68)

Farrell, Kevin R., Mammone, Richard J.

Speaker Recognition Using Neural Tree Networks

Neural Information Processing SystemsDec-31-1994

A new classifier is presented for text-independent speaker recognition. The new classifier is called the modified neural tree network (MNTN). The NTN is a hierarchical classifier that combines the properties of decision trees and feed-forward neural networks. The MNTN differs from the standard NTN in that a new learning rule based on discriminant learning is used, which minimizes the classification error as opposed to a norm of the approximation error. The MNTN also uses leaf probability measures in addition to the class labels.

classifier, recognition, speaker recognition, (11 more...)