Markov Models
Imposing higher-level Structure in Polyphonic Music Generation using Convolutional Restricted Boltzmann Machines and Constraints
Lattner, Stefan, Grachten, Maarten, Widmer, Gerhard
Since computers can automate such processes, automatic music generation has become a small, but steadily emerging field in Artificial Intelligence and Machine Learning. Nevertheless, automatic music generation as a problem is far from solved: musical outputs created by artificial systems are regarded as a curiosity by human listeners at best, but all too often they are taken as a direct offense to our sense of musical aesthetics. This sensitivity to violations of even the most subtle musical norms illustrates how complex the problem of (especially polyphonic) music generation is. In addition, there are hardly any objective evaluation criteria to rigorously test and compare music generation systems. This is lamentable, not least since successful methods for automatic music generation would be of considerable commercial interest to the music, gaming, and film industries.
Microsoft tests AI powerless aircraft that mimics birds
Microsoft is working on an autonomous aircraft that can fly for long periods of time without any power. Rather than a motor, the glider relies on artificial intelligence that mimics how birds fly, autonomously finding thermals, or invisible columns of air that rise due to heat, to carry it for long distances. A recent test of the 16.5-foot aircraft - called a sailplane - conducted in the middle of the desert in Hawthorne, Nevada proved successful - the algorithms researchers developed to help it predict where thermals would appear next worked and kept it aloft. Microsoft is working on an autonomous aircraft that can fly for long periods of time without any power. 'Birds do this seamlessly, and all they're doing is harnessing nature,' Ashish Kapoor, a principal researcher at Microsoft, said.
Procedural Content Generation via Machine Learning (PCGML)
Summerville, Adam, Snodgrass, Sam, Guzdial, Matthew, Holmgård, Christoffer, Hoover, Amy K., Isaksen, Aaron, Nealen, Andy, Togelius, Julian
This survey explores Procedural Content Generation via Machine Learning (PCGML), defined as the generation of game content using machine learning models trained on existing content. As the importance of PCG for game development increases, researchers explore new avenues for generating high-quality content with or without human involvement; this paper addresses the relatively new paradigm of using machine learning (in contrast with search-based, solver-based, and constructive methods). We focus on what is most often considered functional game content such as platformer levels, game maps, interactive fiction stories, and cards in collectible card games, as opposed to cosmetic content such as sprites and sound effects. In addition to using PCG for autonomous generation, co-creativity, mixed-initiative design, and compression, PCGML is suited for repair, critique, and content analysis because of its focus on modeling existing content. We discuss various data sources and representations that affect the resulting generated content. Multiple PCGML methods are covered, including neural networks, long short-term memory (LSTM) networks, autoencoders, and deep convolutional networks; Markov models, $n$-grams, and multi-dimensional Markov chains; clustering; and matrix factorization. Finally, we discuss open problems in the application of PCGML, including learning from small datasets, lack of training data, multi-layered learning, style-transfer, parameter tuning, and PCG as a game mechanic.
Deep Learning the Ising Model Near Criticality
Morningstar, Alan, Melko, Roger G.
It is well established that neural networks with deep architectures perform better than shallow networks for many tasks in machine learning. In statistical physics, while there has been recent interest in representing physical data with generative modelling, the focus has been on shallow neural networks. A natural question to ask is whether deep neural networks hold any advantage over shallow networks in representing such data. We investigate this question by using unsupervised, generative graphical models to learn the probability distribution of a two-dimensional Ising system. Deep Boltzmann machines, deep belief networks, and deep restricted Boltzmann networks are trained on thermal spin configurations from this system, and compared to the shallow architecture of the restricted Boltzmann machine. We benchmark the models, focussing on the accuracy of generating energetic observables near the phase transition, where these quantities are most difficult to approximate. Interestingly, after training the generative networks, we observe that the accuracy essentially depends only on the number of neurons in the first hidden layer of the network, and not on other model details such as network depth or model type. This is evidence that shallow networks are more efficient than deep networks at representing physical probability distributions associated with Ising systems near criticality.
Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models
Magnusson, Måns, Jonsson, Leif, Villani, Mattias, Broman, David
Topic models, and more specifically the class of Latent Dirichlet Allocation (LDA), are widely used for probabilistic modeling of text. MCMC sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-known text corpora of differing sizes and properties. In particular, we propose and compare two different strategies for sampling the parameter block with latent topic indicators. The experiments show that the increase in statistical inefficiency from only partial collapsing is smaller than commonly assumed, and can be more than compensated by the speedup from parallelization and sparsity on larger corpora. We also prove that the partially collapsed samplers scale well with the size of the corpus. The proposed algorithm is fast, efficient, exact, and can be used in more modeling situations than the ordinary collapsed sampler.
Octopus: A Framework for Cost-Quality-Time Optimization in Crowdsourcing
Goel, Karan, Rajpal, Shreya, Mausam, null
Task control of workflows over micro-task crowdsourcing platforms, such as Amazon Mechanical Turk (AMT), has received significant attention in AI literature (Weld et al. 2015). Typically, a requester needs to balance three competing objectives - (1) total cost, owing to payments made to workers for their responses (or ballots), (2) overall quality, usually evaluated as accuracy of the final output, and (3) the total time for completing the task. These criteria are interrelated: increasing the pay per task attracts more workers to the task, thereby reducing completion time. However, it also exhausts the budget sooner, so requesters can afford fewer ballots per task, likely reducing the overall quality. Most prior work on crowd controllers has focused on the tradeoff between cost (or no. of ballots) and quality (Dai et al. 2013; Lin, Mausam, and Weld 2012; Bragg, Mausam, and Weld 2013; Kamar et al. 2013; Parameswaran et al. 2012). A common approach is to define a Partially Observable Markov Decision Process (POMDP) per task, which decides on whether to get another ballot or submit the best answer for that task. However, this work is time-agnostic, and assumes that pay per ballot is given as input.
Hierarchically Compositional Kernels for Scalable Nonparametric Learning
Chen, Jie, Avron, Haim, Sindhwani, Vikas
We propose a novel class of kernels to alleviate the high computational cost of large-scale nonparametric learning with kernel methods. The proposed kernel is defined based on a hierarchical partitioning of the underlying data domain, where the Nystr\"om method (a globally low-rank approximation) is married with a locally lossless approximation in a hierarchical fashion. The kernel maintains (strict) positive-definiteness. The corresponding kernel matrix admits a recursively off-diagonal low-rank structure, which allows for fast linear algebra computations. Suppressing the factor of data dimension, the memory and arithmetic complexities for training a regression or a classifier are reduced from $O(n^2)$ and $O(n^3)$ to $O(nr)$ and $O(nr^2)$, respectively, where $n$ is the number of training examples and $r$ is the rank on each level of the hierarchy. Although other randomized approximate kernels entail a similar complexity, empirical results show that the proposed kernel achieves a matching performance with a smaller $r$. We demonstrate comprehensive experiments to show the effective use of the proposed kernel on data sizes up to the order of millions.
A Copy-Augmented Sequence-to-Sequence Architecture Gives Good Performance on Task-Oriented Dialogue
Eric, Mihail, Manning, Christopher D.
Task-oriented dialogue focuses on conversational agents that participate in user-initiated dialogues on domain-specific topics. In contrast to chatbots, which simply seek to sustain open-ended meaningful discourse, existing task-oriented agents usually explicitly model user intent and belief states. This paper examines bypassing such an explicit representation by depending on a latent neural embedding of state and learning selective attention to dialogue history together with copying to incorporate relevant prior context. We complement recent work by showing the effectiveness of simple sequence-to-sequence neural architectures with a copy mechanism. Our model outperforms more complex memory-augmented models by 7% in per-response generation and is on par with the current state-of-the-art on DSTC2.
Item Recommendation with Continuous Experience Evolution of Users using Brownian Motion
Mukherjee, Subhabrata, Guennemann, Stephan, Weikum, Gerhard
Online review communities are dynamic as users join and leave, adopt new vocabulary, and adapt to evolving trends. Recent work has shown that recommender systems benefit from explicit consideration of user experience. However, prior work assumes a fixed number of discrete experience levels, whereas in reality users gain experience and mature continuously over time. This paper presents a new model that captures the continuous evolution of user experience, and the resulting language model in reviews and other posts. Our model is unsupervised and combines principles of Geometric Brownian Motion, Brownian Motion, and Latent Dirichlet Allocation to trace a smooth temporal progression of user experience and language model respectively. We develop practical algorithms for estimating the model parameters from data and for inference with our model (e.g., to recommend items). Extensive experiments with five real-world datasets show that our model not only fits data better than discrete-model baselines, but also outperforms state-of-the-art methods for predicting item ratings.
Learning non-parametric Markov networks with mutual information
Leppä-aho, Janne, Räisänen, Santeri, Yang, Xiao, Roos, Teemu
We propose a method for learning Markov network structures for continuous data without invoking any assumptions about the distribution of the variables. The method makes use of previous work on a non-parametric estimator for mutual information which is used to create a non-parametric test for multivariate conditional independence. This independence test is then combined with an efficient constraint-based algorithm for learning the graph structure. The performance of the method is evaluated on several synthetic data sets and it is shown to learn considerably more accurate structures than competing methods when the dependencies between the variables involve non-linearities.