Unified Hypersphere Embedding for Speaker Recognition

Jul-22-2018–arXiv.org Artificial Intelligence

ABSTRACT Incremental improvements in accuracy of Convolutional Neural Networks are usually achieved through use of deeper and more complex models trained on larger datasets. However, enlarging dataset and models increases the computation and storage costs and cannot be done indefinitely. In this work, we seek to improve the identification and verification accuracy of a text-independent speaker recognition system without use of extra data or deeper and more complex models by augmenting the training and testing data, finding the optimal dimensionality of embedding space and use of more discriminative loss functions. Index Terms-- speaker recognition, speaker verification, augmentation, discriminative loss function, convolutional neural networks 1. INTRODUCTION Speaker recognition is an area of research with more than 50 years of history and applications ranging from forensics and security to human-computer interaction in consumer electronics. Speaker recognition can be categorized into two tasks of text-dependent and text-independent speaker recognition with regard to the similarity of the uttered content between utterances.

accuracy, machine learning, pattern recognition, (17 more...)

arXiv.org Artificial Intelligence

Jul-22-2018

arXiv.org PDF

Add feedback

Country:
- Europe > Switzerland > Zürich > Zürich (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Pattern Recognition > Speech Recognition (1.00)
  - Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found