Investigation on the use of Hidden-Markov Models in automatic transcription of music
Work on Automatic Music Transcription (AMT) dates back more than 30 years, and has known numerous applications in the fields of music information retrieval, interactive computer systems, and automated musicological analysis (Klapuri, 2004). Due to the difficulty in producing all the information required for a complete musical score, AMT is commonly defined as the computer-assisted process of analyzing an acoustic musical signal so as to write down the musical parameters of the sounds that occur in it, which are basically the pitch, onset time, and duration of each sound to be played. Despite a large enthusiasm for AMT challenges, and several audio-to-MIDI converters available commercially, perfect polyphonic AMT systems are out of reach of today's technology (Klapuri, 2004; Benetos et al., 2013b). To overcome these limitations, a practical engineering solution was to use computational techniques from statistics and digital signal processing, allowing more complex modeling of the musical signal. In this paper, we investigate the use of different Hidden Markov Models (HMMs) in AMT, and evaluate their impacts on transcription performance. HMMs are a ubiquitous tool to model time series data, and have been widely used in various tasks of Music Information Retrieval, especially in music structure analysis by characterizing repetitive patterns (Logan and Chu, 2000) or performing harmonic analysis (Raphael and Stoddard, 2003), chord estimation (Lee and Slaney, 2008) and musicological modeling of note transitions (Ryynanen and Klapuri, 2008). For what concerns the task of AMT, the sequential structure that may be inferred from musical signals can be usefully integrated to systems with HMMs.
Apr-12-2017