Translation-Equivariant Self-Supervised Learning for Pitch Estimation with Optimal Transport
Torres, Bernardo, Riou, Alain, Richard, Gaël, Peeters, Geoffroy
–arXiv.org Artificial Intelligence
ABSTRACT In this paper, we propose an Optimal Transport objective for learning one-dimensional translation-equivariant systems and demonstrate its applicability to single pitch estimation. Our method provides a theoretically grounded, more numerically stable, and simpler alternative for training state-of-the-art self-supervised pitch estimators. 1. INTRODUCTION Pitch estimation is a core task in audio analysis, long studied in the speech and Music Information Retrieval (MIR) communities [1]. It involves estimating the fundamental frequency of harmonic or quasi-harmonic signals, with traditional methods relying on signal processing techniques to extract harmonicity cues [2-4], or by matching the input spectrum to that of a synthetic waveform [5]. Recently, supervised deep learning approaches leveraging large annotated datasets (such as CREPE [6]) have achieved impressive accuracy, but come with notable challenges. In particular, labeling audio with the temporal precision needed for training (typically within a few milliseconds) is labor-intensive and prone to errors.
arXiv.org Artificial Intelligence
Oct-28-2025
- Country:
- Asia > South Korea
- Europe > France
- Île-de-France > Paris > Paris (0.04)
- Genre:
- Research Report (0.40)
- Technology: