Translation-Equivariant Self-Supervised Learning for Pitch Estimation with Optimal Transport

Torres, Bernardo, Riou, Alain, Richard, Gaël, Peeters, Geoffroy

Oct-28-2025–arXiv.org Artificial Intelligence

ABSTRACT In this paper, we propose an Optimal Transport objective for learning one-dimensional translation-equivariant systems and demonstrate its applicability to single pitch estimation. Our method provides a theoretically grounded, more numerically stable, and simpler alternative for training state-of-the-art self-supervised pitch estimators. 1. INTRODUCTION Pitch estimation is a core task in audio analysis, long studied in the speech and Music Information Retrieval (MIR) communities [1]. It involves estimating the fundamental frequency of harmonic or quasi-harmonic signals, with traditional methods relying on signal processing techniques to extract harmonicity cues [2-4], or by matching the input spectrum to that of a synthetic waveform [5]. Recently, supervised deep learning approaches leveraging large annotated datasets (such as CREPE [6]) have achieved impressive accuracy, but come with notable challenges. In particular, labeling audio with the temporal precision needed for training (typically within a few milliseconds) is labor-intensive and prone to errors.

artificial intelligence, inductive learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Oct-28-2025

arXiv.org PDF

Add feedback

Country:
- Europe > France (0.14)
- Asia > South Korea (0.14)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.54)
  - Inductive Learning (0.42)