Contrastive timbre representations for musical instrument and synthesizer retrieval
Vaillant, Gwendal Le, Molle, Yannick
–arXiv.org Artificial Intelligence
Efficiently retrieving specific instrument timbres from audio mixtures remains a challenge in digital music production. This paper introduces a contrastive learning framework for musical instrument retrieval, enabling direct querying of instrument databases using a single model for both single- and multi-instrument sounds. We propose techniques to generate realistic positive/negative pairs of sounds for virtual musical instruments, such as samplers and synthesizers, addressing limitations in common audio data augmentation methods. The first experiment focuses on instrument retrieval from a dataset of 3,884 instruments, using single-instrument audio as input. Contrastive approaches are competitive with previous works based on classification pre-training. The second experiment considers multi-instrument retrieval with a mixture of instruments as audio input. In this case, the proposed contrastive framework outperforms related works, achieving 81.7\% top-1 and 95.7\% top-5 accuracies for three-instrument mixtures.
arXiv.org Artificial Intelligence
Sep-17-2025
- Country:
- Europe > Belgium
- Brussels-Capital Region > Brussels (0.04)
- Wallonia (0.04)
- Europe > Belgium
- Genre:
- Research Report (0.50)
- Industry:
- Leisure & Entertainment (1.00)
- Media > Music (1.00)
- Technology: