Centroid-based deep metric learning for speaker recognition

Wang, Jixuan, Wang, Kuan-Chieh, Law, Marc, Rudzicz, Frank, Brudno, Michael

arXiv.org Machine Learning 

Then, a PLDA model is trained to measure thesimilarity of i-vectors. Replacing traditional i-vectors with speaker embedding models based on deep neural networks haslead to improvement in SV [4, 3]. Nonetheless, a PLDA classifier is still needed to compare the similarity of embeddings. More recently, end-to-end training of an embedding networkthat makes decision by comparing distance in the embedding to a cross-validated threshold outperformed traditional methods. For detailed comparison between embedding networksand i-vector based methods, we refer the reader to [6, 4, 3]. Building on top of these studies, our work focuses on the comparison between two different approaches for deep metric learning (TL [5, 6, 7, 8] and PNL [10]) for end-to-end speaker embedding models. Deep metric learning: End-to-end speaker embedding models can be seen as a form of deep metric learning, which has been widely studied in the machine learning literature. Early examples of metric learning with neural networks include signature[11] and face verification [12]. Both compare pairs of examples with standard similarity functions (e.g.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found