Disambiguating Music Artists at Scale with Audio Metric Learning
Royo-Letelier, Jimena, Hennequin, Romain, Tran, Viet-Anh, Moussallam, Manuel
ABSTRACT We address the problem of disambiguating large scale catalogs through the definition of an unknown artist clustering task. We explore the use of metric learning techniques to learn artist embeddings directly from audio, and using a dedicated homonym artists dataset, we compare our method with a recent approach that learn similar embeddings using artist classifiers. While both systems have the ability to disambiguate unknown artists relying exclusively on audio, we show that our system is more suitable in the case when enough audio data is available for each artist in the train dataset. We also propose a new negative sampling method for metric learning that takes advantage of side information such as music genre during the learning phase and shows promising results for the artist clustering task. 1. INTRODUCTION 1.1 Motivation With contemporary online music catalogs typically proposing dozens of millions of recordings, a major problem is the lack of an universal and reliable mean to identify music artists. Contrarily to albums' and tracks' ISRC As a direct consequence, the name of an artist remains its defacto identifier in practice although it results in common ambiguity issues. For example, name artist collisions (e.g. Bill Evans is the name of a jazz pianist but also the name of a jazz saxophonist and the name of a blackgrass banjo player) or artist aliases (e.g. Youssou N'Dour vs. Youssou Ndour, Simon & Garfunkel vs Paul Simon and Art Garfunkel, Cat Stevens vs Yusuf Islam) are usual.
Oct-3-2018