We introduce a Maximum Entropy model able to capture the statistics of melodies in music. The model can be used to generate new melodies that emulate the style of the musical corpus which was used to train it. Instead of using the $n-$body interactions of $(n-1)-$order Markov models, traditionally used in automatic music generation, we use a $k-$nearest neighbour model with pairwise interactions only. In that way, we keep the number of parameters low and avoid over-fitting problems typical of Markov models. We show that long-range musical phrases don't need to be explicitly enforced using high-order Markov interactions, but can instead emerge from multiple, competing, pairwise interactions. We validate our Maximum Entropy model by contrasting how much the generated sequences capture the style of the original corpus without plagiarizing it. To this end we use a data-compression approach to discriminate the levels of borrowing and innovation featured by the artificial sequences. The results show that our modelling scheme outperforms both fixed-order and variable-order Markov models. This shows that, despite being based only on pairwise interactions, this Maximum Entropy scheme opens the possibility to generate musically sensible alterations of the original phrases, providing a way to generate innovation.
The field of machine learning underwent massive changes in the 2010's. At the beginning, the field saw diverse approaches applied to a variety of topics and data structures. Then Alexnet blew away the competition for the Imagenet challenge with his CNN, and the field was forever changed. However, there was a warming up phase. Caffe's first release was in 2013.
Barbara started by introducing machine learning (ML), gave a brief overview of R and then discussed three examples; classifying hand written digits, estimating values in a socio-economic dataset and clustering crimes in Chicago. ML is statistics in steroids. ML uses data to find that pattern then uses that pattern (model) to predict results from similar data. Barbra uses the example of classifying film genres into either action or romance based on the number of kicks and kisses. Barbara described supervised and unsupervised. Unsupervised is the "wild, wild west" we can't train the model and it is much more difficult to understand how effective these are. Back to supervised learning, it's important to choose good predicting factors – in the movie example perhaps the title, actors, script may have been better predictors that the number of kicks and kisses. Then you must choose the algorithm and then tune it and finally make it useful and visible and get it into production - it's a hard job especially when data scientists and software developer seem to be different tribes.
A model for hit song prediction can be used in the pop music industry to identify emerging trends and potential artists or songs before they are marketed to the public. While most previous work formulates hit song prediction as a regression or classification problem, we present in this paper a convolutional neural network (CNN) model that treats it as a ranking problem. Specifically, we use a commercial dataset with daily play-counts to train a multi-objective Siamese CNN model with Euclidean loss and pairwise ranking loss to learn from audio the relative ranking relations among songs. Besides, we devise a number of pair sampling methods according to some empirical observation of the data. Our experiment shows that the proposed model with a sampling method called A/B sampling leads to much higher accuracy in hit song prediction than the baseline regression model. Moreover, we can further improve the accuracy by using a neural attention mechanism to extract the highlights of songs and by using a separate CNN model to offer high-level features of songs.
My data: I am using random forest to essentially predict which price each person should get to increase revenue uplift. I then run 4 models to predict how much a customer would spend on each price (IE I separate the data by the price the customer gets, so model is run on 4 separate datasets). I then use the 4 models on the validation/test data to see how much the new customers would spend for each price. I then take the max of those 4 predicted prices and use that as the predicted price we should give that customer. I then compare the predicted price point with the actual price the customer was given and calculate the mean revenue for those where predicted actual.