Undirected Networks
Pseudo-extended Markov chain Monte Carlo
Nemeth, Christopher, Lindsten, Fredrik, Filippone, Maurizio, Hensman, James
Sampling from the posterior distribution using Markov chain Monte Carlo (MCMC) methods can require an exhaustive number of iterations to fully explore the correct posterior. This is often the case when the posterior of interest is multi-modal, as the MCMC sampler can become trapped in a local mode for a large number of iterations. In this paper, we introduce the pseudo-extended MCMC method as an approach for improving the mixing of the MCMC sampler in complex posterior distributions. The pseudo-extended method augments the state-space of the posterior using pseudo-samples as auxiliary variables, where on the extended space, the MCMC sampler is able to easily move between the well-separated modes of the posterior. We apply the pseudo-extended method within an Hamiltonian Monte Carlo sampler and show that by using the No U-turn algorithm (Hoffman and Gelman, 2014), our proposed sampler is completely tuning free. We compare the pseudo-extended method against well-known tempered MCMC algorithms and show the advantages of the new sampler on a number of challenging examples from the statistics literature.
A Review of Methodologies for Natural-Language-Facilitated Human-Robot Cooperation
Natural-language-facilitated human-robot cooperation (NLC) refers to using natural language (NL) to facilitate interactive information sharing and task executions with a common goal constraint between robots and humans. Recently, NLC research has received increasing attention. Typical NLC scenarios include robotic daily assistance, robotic health caregiving, intelligent manufacturing, autonomous navigation, and robot social accompany. However, a thorough review, that can reveal latest methodologies to use NL to facilitate human-robot cooperation, is missing. In this review, a comprehensive summary about methodologies for NLC is presented. NLC research includes three main research focuses: NL instruction understanding, NL-based execution plan generation, and knowledge-world mapping. In-depth analyses on theoretical methods, applications, and model advantages and disadvantages are made. Based on our paper review and perspective, potential research directions of NLC are summarized.
Imposing higher-level Structure in Polyphonic Music Generation using Convolutional Restricted Boltzmann Machines and Constraints
Lattner, Stefan, Grachten, Maarten, Widmer, Gerhard
Since computers can automate such processes, automatic music generation has become a small, but steadily emerging field in Artificial Intelligence and Machine Learning. Nevertheless, automatic music generation as a problem is far from solved: musical outputs created by artificial systems are regarded as a curiosity by human listeners at best, but all too often they are taken as a direct offense to our sense of musical aesthetics. This sensitivity to violations of even the most subtle musical norms illustrates how complex the problem of (especially polyphonic) music generation is. In addition, there are hardly any objective evaluation criteria to rigorously test and compare music generation systems. This is lamentable, not least since successful methods for automatic music generation would be of considerable commercial interest to the music, gaming, and film industries.
Microsoft tests AI powerless aircraft that mimics birds
Microsoft is working on an autonomous aircraft that can fly for long periods of time without any power. Rather than a motor, the glider relies on artificial intelligence that mimics how birds fly, autonomously finding thermals, or invisible columns of air that rise due to heat, to carry it for long distances. A recent test of the 16.5-foot aircraft - called a sailplane - conducted in the middle of the desert in Hawthorne, Nevada proved successful - the algorithms researchers developed to help it predict where thermals would appear next worked and kept it aloft. Microsoft is working on an autonomous aircraft that can fly for long periods of time without any power. 'Birds do this seamlessly, and all they're doing is harnessing nature,' Ashish Kapoor, a principal researcher at Microsoft, said.
Procedural Content Generation via Machine Learning (PCGML)
Summerville, Adam, Snodgrass, Sam, Guzdial, Matthew, Holmgård, Christoffer, Hoover, Amy K., Isaksen, Aaron, Nealen, Andy, Togelius, Julian
This survey explores Procedural Content Generation via Machine Learning (PCGML), defined as the generation of game content using machine learning models trained on existing content. As the importance of PCG for game development increases, researchers explore new avenues for generating high-quality content with or without human involvement; this paper addresses the relatively new paradigm of using machine learning (in contrast with search-based, solver-based, and constructive methods). We focus on what is most often considered functional game content such as platformer levels, game maps, interactive fiction stories, and cards in collectible card games, as opposed to cosmetic content such as sprites and sound effects. In addition to using PCG for autonomous generation, co-creativity, mixed-initiative design, and compression, PCGML is suited for repair, critique, and content analysis because of its focus on modeling existing content. We discuss various data sources and representations that affect the resulting generated content. Multiple PCGML methods are covered, including neural networks, long short-term memory (LSTM) networks, autoencoders, and deep convolutional networks; Markov models, $n$-grams, and multi-dimensional Markov chains; clustering; and matrix factorization. Finally, we discuss open problems in the application of PCGML, including learning from small datasets, lack of training data, multi-layered learning, style-transfer, parameter tuning, and PCG as a game mechanic.
Deep Learning the Ising Model Near Criticality
Morningstar, Alan, Melko, Roger G.
It is well established that neural networks with deep architectures perform better than shallow networks for many tasks in machine learning. In statistical physics, while there has been recent interest in representing physical data with generative modelling, the focus has been on shallow neural networks. A natural question to ask is whether deep neural networks hold any advantage over shallow networks in representing such data. We investigate this question by using unsupervised, generative graphical models to learn the probability distribution of a two-dimensional Ising system. Deep Boltzmann machines, deep belief networks, and deep restricted Boltzmann networks are trained on thermal spin configurations from this system, and compared to the shallow architecture of the restricted Boltzmann machine. We benchmark the models, focussing on the accuracy of generating energetic observables near the phase transition, where these quantities are most difficult to approximate. Interestingly, after training the generative networks, we observe that the accuracy essentially depends only on the number of neurons in the first hidden layer of the network, and not on other model details such as network depth or model type. This is evidence that shallow networks are more efficient than deep networks at representing physical probability distributions associated with Ising systems near criticality.
Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models
Magnusson, Måns, Jonsson, Leif, Villani, Mattias, Broman, David
Topic models, and more specifically the class of Latent Dirichlet Allocation (LDA), are widely used for probabilistic modeling of text. MCMC sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-known text corpora of differing sizes and properties. In particular, we propose and compare two different strategies for sampling the parameter block with latent topic indicators. The experiments show that the increase in statistical inefficiency from only partial collapsing is smaller than commonly assumed, and can be more than compensated by the speedup from parallelization and sparsity on larger corpora. We also prove that the partially collapsed samplers scale well with the size of the corpus. The proposed algorithm is fast, efficient, exact, and can be used in more modeling situations than the ordinary collapsed sampler.
Octopus: A Framework for Cost-Quality-Time Optimization in Crowdsourcing
Goel, Karan, Rajpal, Shreya, Mausam, null
Task control of workflows over micro-task crowdsourcing platforms, such as Amazon Mechanical Turk (AMT), has received significant attention in AI literature (Weld et al. 2015). Typically, a requester needs to balance three competing objectives - (1) total cost, owing to payments made to workers for their responses (or ballots), (2) overall quality, usually evaluated as accuracy of the final output, and (3) the total time for completing the task. These criteria are interrelated: increasing the pay per task attracts more workers to the task, thereby reducing completion time. However, it also exhausts the budget sooner, so requesters can afford fewer ballots per task, likely reducing the overall quality. Most prior work on crowd controllers has focused on the tradeoff between cost (or no. of ballots) and quality (Dai et al. 2013; Lin, Mausam, and Weld 2012; Bragg, Mausam, and Weld 2013; Kamar et al. 2013; Parameswaran et al. 2012). A common approach is to define a Partially Observable Markov Decision Process (POMDP) per task, which decides on whether to get another ballot or submit the best answer for that task. However, this work is time-agnostic, and assumes that pay per ballot is given as input.
Hierarchically Compositional Kernels for Scalable Nonparametric Learning
Chen, Jie, Avron, Haim, Sindhwani, Vikas
We propose a novel class of kernels to alleviate the high computational cost of large-scale nonparametric learning with kernel methods. The proposed kernel is defined based on a hierarchical partitioning of the underlying data domain, where the Nystr\"om method (a globally low-rank approximation) is married with a locally lossless approximation in a hierarchical fashion. The kernel maintains (strict) positive-definiteness. The corresponding kernel matrix admits a recursively off-diagonal low-rank structure, which allows for fast linear algebra computations. Suppressing the factor of data dimension, the memory and arithmetic complexities for training a regression or a classifier are reduced from $O(n^2)$ and $O(n^3)$ to $O(nr)$ and $O(nr^2)$, respectively, where $n$ is the number of training examples and $r$ is the rank on each level of the hierarchy. Although other randomized approximate kernels entail a similar complexity, empirical results show that the proposed kernel achieves a matching performance with a smaller $r$. We demonstrate comprehensive experiments to show the effective use of the proposed kernel on data sizes up to the order of millions.
A Copy-Augmented Sequence-to-Sequence Architecture Gives Good Performance on Task-Oriented Dialogue
Eric, Mihail, Manning, Christopher D.
Task-oriented dialogue focuses on conversational agents that participate in user-initiated dialogues on domain-specific topics. In contrast to chatbots, which simply seek to sustain open-ended meaningful discourse, existing task-oriented agents usually explicitly model user intent and belief states. This paper examines bypassing such an explicit representation by depending on a latent neural embedding of state and learning selective attention to dialogue history together with copying to incorporate relevant prior context. We complement recent work by showing the effectiveness of simple sequence-to-sequence neural architectures with a copy mechanism. Our model outperforms more complex memory-augmented models by 7% in per-response generation and is on par with the current state-of-the-art on DSTC2.