We believe that collaborative filtering is well described by a probabilistic model in which people and the items they view or buy are each divided into (unknown) clusters and there are link probabilities between these clusters. EM is an obvious method for estimating these models, but does not work because it cannot be efficiently constructed to recognize the constraint that a movie liked by two different people must be in the same movie class each time. K-means clustering is fast but ad hoc. Repeated clustering using K-means clustering or a "soft clustering" version of K-means may be useful, but usually does not improve accuracy. Clustering movies or people on other relevant attributes can help - and does help for the case of CD purchase data. Gibbs sampling works well and has the virtue of being easily extended to much more complex models, but is computationally expensive. We are currently developing more efficient Gibbs sampling methods for collaborative filtering problems, extending our repeated clustering and Gibbs sampling code to incorporate multiple attributes, and applying them to more real data sets.
We introduce a Maximum Entropy model able to capture the statistics of melodies in music. The model can be used to generate new melodies that emulate the style of the musical corpus which was used to train it. Instead of using the $n-$body interactions of $(n-1)-$order Markov models, traditionally used in automatic music generation, we use a $k-$nearest neighbour model with pairwise interactions only. In that way, we keep the number of parameters low and avoid over-fitting problems typical of Markov models. We show that long-range musical phrases don't need to be explicitly enforced using high-order Markov interactions, but can instead emerge from multiple, competing, pairwise interactions. We validate our Maximum Entropy model by contrasting how much the generated sequences capture the style of the original corpus without plagiarizing it. To this end we use a data-compression approach to discriminate the levels of borrowing and innovation featured by the artificial sequences. The results show that our modelling scheme outperforms both fixed-order and variable-order Markov models. This shows that, despite being based only on pairwise interactions, this Maximum Entropy scheme opens the possibility to generate musically sensible alterations of the original phrases, providing a way to generate innovation.
Artificial intelligence (AI) systems, powered by massive data and sophisticated algorithms -- including but not limited to -- deep neural networks and statistical machine learning (ML)(support vector machines, clustering, random forest, etc.), are having profound and transformative impact on our daily lives as they make their way into everything from finance to healthcare, from retail to transportation. Netflix movie recommender, Amazon's product prediction, Facebook's uncanny ability to show what you may like, Google's assistant, DeepMind's AlphaGo, Stanford's AI beating human doctors. Machine learning is eating software. However, one of the common features of these powerful algorithms is that they utilize sophisticated mathematics to do their job -- to classify and segment an image, to arrive at the key decisions, to make a product recommendation, to model a complex phenomenon, or to extract and visualize a hidden pattern from a deluge of data. All of these mathematical processes are, quite simply, beyond the scope of a single human (or a team) to perform manually (even on a computer) or inside their head.
The field of machine learning underwent massive changes in the 2010's. At the beginning, the field saw diverse approaches applied to a variety of topics and data structures. Then Alexnet blew away the competition for the Imagenet challenge with his CNN, and the field was forever changed. However, there was a warming up phase. Caffe's first release was in 2013.
Barbara started by introducing machine learning (ML), gave a brief overview of R and then discussed three examples; classifying hand written digits, estimating values in a socio-economic dataset and clustering crimes in Chicago. ML is statistics in steroids. ML uses data to find that pattern then uses that pattern (model) to predict results from similar data. Barbra uses the example of classifying film genres into either action or romance based on the number of kicks and kisses. Barbara described supervised and unsupervised. Unsupervised is the "wild, wild west" we can't train the model and it is much more difficult to understand how effective these are. Back to supervised learning, it's important to choose good predicting factors – in the movie example perhaps the title, actors, script may have been better predictors that the number of kicks and kisses. Then you must choose the algorithm and then tune it and finally make it useful and visible and get it into production - it's a hard job especially when data scientists and software developer seem to be different tribes.