Unified Hypersphere Embedding for Speaker Recognition
Hajibabaei, Mahdi, Dai, Dengxin
–arXiv.org Artificial Intelligence
ABSTRACT Incremental improvements in accuracy of Convolutional Neural Networks are usually achieved through use of deeper and more complex models trained on larger datasets. However, enlarging dataset and models increases the computation and storage costs and cannot be done indefinitely. In this work, we seek to improve the identification and verification accuracy of a text-independent speaker recognition system without use of extra data or deeper and more complex models by augmenting the training and testing data, finding the optimal dimensionality of embedding space and use of more discriminative loss functions. Index Terms-- speaker recognition, speaker verification, augmentation, discriminative loss function, convolutional neural networks 1. INTRODUCTION Speaker recognition is an area of research with more than 50 years of history and applications ranging from forensics and security to human-computer interaction in consumer electronics. Speaker recognition can be categorized into two tasks of text-dependent and text-independent speaker recognition with regard to the similarity of the uttered content between utterances.
arXiv.org Artificial Intelligence
Jul-22-2018