AnthemScore - Music Transcription Software
I'm going to try to make the case that note detection in music is essentially image recognition with a few small differences and I'll describe some techniques I used to modify neural networks from computer vision to produce sheet music transcriptions of (polyphonic) music that are actually quite playable. Convolutional neural networks (CNNs) have produced the most accurate results in computer vision for several years. In a typical CNN you start with an image as a 3 dimensional array (width, height, and 3 color channels) and then pass that data through several layers of convolutions, max pooling, and some kind of non-linearity, like a ReLU. Backpropagation is used to iteratively update the convolution parameters from a set of labeled training data (pairs of input and desired output). This process builds up a sophisticated function composed of many simpler functions, primarily convolutions.
Jul-25-2016, 05:29:35 GMT