Centroid-based deep metric learning for speaker recognition
Wang, Jixuan, Wang, Kuan-Chieh, Law, Marc, Rudzicz, Frank, Brudno, Michael
Then, a PLDA model is trained to measure thesimilarity of i-vectors. Replacing traditional i-vectors with speaker embedding models based on deep neural networks haslead to improvement in SV [4, 3]. Nonetheless, a PLDA classifier is still needed to compare the similarity of embeddings. More recently, end-to-end training of an embedding networkthat makes decision by comparing distance in the embedding to a cross-validated threshold outperformed traditional methods. For detailed comparison between embedding networksand i-vector based methods, we refer the reader to [6, 4, 3]. Building on top of these studies, our work focuses on the comparison between two different approaches for deep metric learning (TL [5, 6, 7, 8] and PNL [10]) for end-to-end speaker embedding models. Deep metric learning: End-to-end speaker embedding models can be seen as a form of deep metric learning, which has been widely studied in the machine learning literature. Early examples of metric learning with neural networks include signature[11] and face verification [12]. Both compare pairs of examples with standard similarity functions (e.g.
Feb-6-2019
- Country:
- North America > Canada > Ontario > Toronto (0.14)
- Genre:
- Research Report (0.83)
- Industry:
- Education (0.34)
- Technology: