Generating Piano Music with Dilated Convolutional Neural Networks
A considerable amount of research has been devoted to training deep neural networks that can compose piano music. For example, Musenet, developed by OpenAI, has trained large-scale transformer models capable of composing realistic piano pieces that are many minutes in length. The model used by Musenet adopts many of the technologies, such as attention layers, that were originally developed for NLP tasks. See this previous TDS post for more details on applying attention-based models to music generation. Although NLP-based methods are a fantastic fit for machine-based music generation (after all, music is like a language), the transformer model architecture is somewhat involved, and proper data preparation and training can require great care and experience. In particular, I'll focus on fully convolutional neural networks based on dilated convolutions, which require only a handful of lines of code to define, take minimal data preparation, and are easy to train. In 2016, DeepMind researchers introduced the WaveNet model architecture,¹ which yielded state-of-the-art performance in speech synthesis. Their research demonstrated that stacked 1D convolutional layers with exponentially growing dilation rates can process sequences of raw audio waveforms extremely efficiently, leading to generative models that can synthesize convincing audio from a variety of sources, including piano music. In this post, I build upon DeepMind's research, with an explicit focus on generating piano music.
Sep-3-2020, 15:30:08 GMT