Modelling the real world complexity of music is a challenge for machine learning. We address the task of modeling melodic sequences from the same music genre. We perform a comparative analysis of two probabilistic models; a Dirichlet Variable Length Markov Model (Dirichlet-VMM) and a Time Convolutional Restricted Boltzmann Machine (TC-RBM). We show that the TC-RBM learns descriptive music features, such as underlying chords and typical melody transitions and dynamics. We assess the models for future prediction and compare their performance to a VMM, which is the current state of the art in melody generation. We show that both models perform significantly better than the VMM, with the Dirichlet-VMM marginally outperforming the TC-RBM. Finally, we evaluate the short order statistics of the models, using the Kullback-Leibler divergence between test sequences and model samples, and show that our proposed methods match the statistics of the music genre significantly better than the VMM.
Generating plausible and fluent sentence with desired properties has long been a challenge. Most of the recent works use recurrent neural networks (RNNs) and their variants to predict following words given previous sequence and target label. In this paper, we propose a novel framework to generate constrained sentences via Gibbs Sampling. The candidate sentences are revised and updated iteratively, with sampled new words replacing old ones. Our experiments show the effectiveness of the proposed method to generate plausible and diverse sentences.
Restricted Boltzman Machines (RBMs) have been successfully used in recommender systems. However, as with most of other collaborative filtering techniques, it cannot solve cold start problems for there is no rating for a new item. In this paper, we first apply conditional RBM (CRBM) which could take extra information into account and show that CRBM could solve cold start problem very well, especially for rating prediction task. CRBM naturally combine the content and collaborative data under a single framework which could be fitted effectively. Experiments show that CRBM can be compared favourably with matrix factorization models, while hidden features learned from the former models are more easy to be interpreted.
Deep dynamic generative models are developed to learn sequential dependencies in time-series data. The multi-layered model is designed by constructing a hierarchy of temporal sigmoid belief networks (TSBNs), defined as a sequential stack of sigmoid belief networks (SBNs). Each SBN has a contextual hidden state, inherited from the previous SBNs in the sequence, and is used to regulate its hidden bias. Scalable learning and inference algorithms are derived by introducing a recognition model that yields fast sampling from the variational posterior. This recognition model is trained jointly with the generative model, by maximizing its variational lower bound on the log-likelihood. Experimental results on bouncing balls, polyphonic music, motion capture, and text streams show that the proposed approach achieves state-of-the-art predictive performance, and has the capacity to synthesize various sequences.
We propose an automatic music generation demo based on artificial neural networks, which integrates the ability of Long Short-Term Memory (LSTM) in memorizing and retrieving useful history information, together with the advantage of Restricted Boltzmann Machine (RBM) in high dimensional data modelling. Our model can generalize to different musical styles and generate polyphonic music better than previous models.