AITopics | Bogdanov, Dmitry

Collaborating Authors

Bogdanov, Dmitry

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Supervised contrastive learning from weakly-labeled audio segments for musical version matching

Serrà, Joan, Araz, R. Oguz, Bogdanov, Dmitry, Mitsufuji, Yuki

arXiv.org Machine LearningFeb-24-2025

Detecting musical versions (different renditions of the same piece) is a challenging task with important applications. Because of the ground truth nature, existing approaches match musical versions at the track level (e.g., whole song). However, most applications require to match them at the segment level (e.g., 20s chunks). In addition, existing approaches resort to classification and triplet losses, disregarding more recent losses that could bring meaningful improvements. In this paper, we propose a method to learn from weakly annotated segments, together with a contrastive loss variant that outperforms well-studied alternatives. The former is based on pairwise segment distance reductions, while the latter modifies an existing loss following decoupling, hyper-parameter, and geometric considerations. With these two elements, we do not only achieve state-of-the-art results in the standard track-level evaluation, but we also obtain a breakthrough performance in a segment-level evaluation. We believe that, due to the generality of the challenges addressed here, the proposed methods may find utility in domains beyond audio or musical version matching.

artificial intelligence, evaluation, machine learning, (12 more...)

arXiv.org Machine Learning

2502.16936

Genre: Research Report (0.83)

Industry:

Media > Music (0.93)
Leisure & Entertainment (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio

Alonso-Jiménez, Pablo, Pepino, Leonardo, Batlle-Roca, Roser, Zinemanas, Pablo, Bogdanov, Dmitry, Serra, Xavier, Rocamora, Martín

arXiv.org Artificial IntelligenceFeb-14-2024

We present PECMAE, an interpretable model for music audio classification based on prototype learning. Our model is based on a previous method, APNet, which jointly learns an autoencoder and a prototypical network. Instead, we propose to decouple both training processes. This enables us to leverage existing self-supervised autoencoders pre-trained on much larger data (EnCodecMAE), providing representations with better generalization. APNet allows prototypes' reconstruction to waveforms for interpretability relying on the nearest training data samples. In contrast, we explore using a diffusion decoder that allows reconstruction without such dependency. We evaluate our method on datasets for music instrument classification (Medley-Solos-DB) and genre recognition (GTZAN and a larger in-house dataset), the latter being a more challenging task not addressed with prototypical networks before. We find that the prototype-based models preserve most of the performance achieved with the autoencoder embeddings, while the sonification of prototypes benefits understanding the behavior of the classifier.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2402.09318

Country: South America (0.14)

Genre: Research Report (0.82)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation

Manco, Ilaria, Weck, Benno, Doh, SeungHeon, Won, Minz, Zhang, Yixiao, Bogdanov, Dmitry, Wu, Yusong, Chen, Ke, Tovstogan, Philip, Benetos, Emmanouil, Quinton, Elio, Fazekas, György, Nam, Juhan

arXiv.org Artificial IntelligenceNov-22-2023

We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models. The dataset consists of 1.1k human-written natural language descriptions of 706 music recordings, all publicly accessible and released under Creative Common licenses. To showcase the use of our dataset, we benchmark popular models on three key music-and-language tasks (music captioning, text-to-music generation and music-language retrieval). Our experiments highlight the importance of cross-dataset evaluation and offer insights into how researchers can use SDD to gain a broader understanding of model performance.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2311.10057

Country:

Europe (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.82)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Law (1.00)
Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback