TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music

Chen, Ke, Yu, Shuai, Wang, Cheng-i, Li, Wei, Berg-Kirkpatrick, Taylor, Dubnov, Shlomo

Feb-2-2022–arXiv.org Artificial Intelligence

Singing melody extraction is an important problem in the field of music information retrieval. Existing methods typically rely on frequency-domain representations to estimate the sung frequencies. However, this design does not lead to human-level performance in the perception of melody information for both tone (pitch-class) and octave. In this paper, we propose TONet, a plug-and-play model that improves both tone and octave perceptions by leveraging a novel input representation and a novel network architecture. First, we present an improved input representation, the Tone-CFP, that explicitly groups harmonics via a rearrangement of frequency-bins. Second, we introduce an encoder-decoder architecture that is designed to obtain a salience feature map, a tone feature map, and an octave feature map. Third, we propose a tone-octave fusion mechanism to improve the final salience feature map. Experiments are done to verify the capability of TONet with various baseline backbone models. Our results show that tone-octave fusion with Tone-CFP can significantly improve the singing voice extraction performance across various datasets -- with substantial gains in octave and tone accuracy.

extraction, representation, tonet, (16 more...)

arXiv.org Artificial Intelligence

Feb-2-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > San Diego County > San Diego (0.04)

Genre:
- Research Report > New Finding (0.54)

Industry:
- Media > Music (1.00)
- Leisure & Entertainment (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)