Learning Graphical Models
Local Structure Discovery in Bayesian Networks
Niinimaki, Teppo, Parviainen, Pekka
Learning a Bayesian network structure from data is an NP-hard problem and thus exact algorithms are feasible only for small data sets. Therefore, network structures for larger networks are usually learned with various heuristics. Another approach to scaling up the structure learning is local learning. In local learning, the modeler has one or more target variables that are of special interest; he wants to learn the structure near the target variables and is not interested in the rest of the variables. In this paper, we present a score-based local learning algorithm called SLL. We conjecture that our algorithm is theoretically sound in the sense that it is optimal in the limit of large sample size. Empirical results suggest that SLL is competitive when compared to the constraint-based HITON algorithm. We also study the prospects of constructing the network structure for the whole node set based on local results by presenting two algorithms and comparing them to several heuristics.
Hilbert Space Embeddings of POMDPs
Nishiyama, Yu, Boularias, Abdeslam, Gretton, Arthur, Fukumizu, Kenji
A nonparametric approach for policy learning for POMDPs is proposed. The approach represents distributions over the states, observations, and actions as embeddings in feature spaces, which are reproducing kernel Hilbert spaces. Distributions over states given the observations are obtained by applying the kernel Bayes' rule to these distribution embeddings. Policies and value functions are defined on the feature space over states, which leads to a feature space expression for the Bellman equation. Value iteration may then be used to estimate the optimal value function and associated policy. Experimental results confirm that the correct policy is learned using the feature space representation.
A Model-Based Approach to Rounding in Spectral Clustering
Poon, Leonard K. M., Liu, April H., Liu, Tengfei, Zhang, Nevin Lianwen
In spectral clustering, one defines a similarity matrix for a collection of data points, transforms the matrix to get the Laplacian matrix, finds the eigenvectors of the Laplacian matrix, and obtains a partition of the data using the leading eigenvectors. The last step is sometimes referred to as rounding, where one needs to decide how many leading eigenvectors to use, to determine the number of clusters, and to partition the data points. In this paper, we propose a novel method for rounding. The method differs from previous methods in three ways. First, we relax the assumption that the number of clusters equals the number of eigenvectors used. Second, when deciding the number of leading eigenvectors to use, we not only rely on information contained in the leading eigenvectors themselves, but also use subsequent eigenvectors. Third, our method is model-based and solves all the three subproblems of rounding using a class of graphical models called latent tree models. We evaluate our method on both synthetic and real-world data. The results show that our method works correctly in the ideal case where between-clusters similarity is 0, and degrades gracefully as one moves away from the ideal case.
Causal Discovery of Linear Cyclic Models from Multiple Experimental Data Sets with Overlapping Variables
Hyttinen, Antti, Eberhardt, Frederick, Hoyer, Patrik O.
Much of scientific data is collected as randomized experiments intervening on some and observing other variables of interest. Quite often, a given phenomenon is investigated in several studies, and different sets of variables are involved in each study. In this article we consider the problem of integrating such knowledge, inferring as much as possible concerning the underlying causal structure with respect to the union of observed variables from such experimental or passive observational overlapping data sets. We do not assume acyclicity or joint causal sufficiency of the underlying data generating model, but we do restrict the causal relationships to be linear and use only second order statistics of the data. We derive conditions for full model identifiability in the most generic case, and provide novel techniques for incorporating an assumption of faithfulness to aid in inference. In each case we seek to establish what is and what is not determined by the data at hand.
Exploiting compositionality to explore a large space of model structures
Grosse, Roger, Salakhutdinov, Ruslan R, Freeman, William T., Tenenbaum, Joshua B.
The recent proliferation of richly structured probabilistic models raises the question of how to automatically determine an appropriate model for a dataset. We investigate this question for a space of matrix decomposition models which can express a variety of widely used models from unsupervised learning. To enable model selection, we organize these models into a context-free grammar which generates a wide variety of structures through the compositional application of a few simple rules. We use our grammar to generically and efficiently infer latent components and estimate predictive likelihood for nearly 2500 structures using a small toolbox of reusable algorithms. Using a greedy search over our grammar, we automatically choose the decomposition structure from raw data by evaluating only a small fraction of all models. The proposed method typically finds the correct structure for synthetic data and backs off gracefully to simpler models under heavy noise. It learns sensible structures for datasets as diverse as image patches, motion capture, 20 Questions, and U.S. Senate votes, all using exactly the same code.
Semi-Supervised Classification Through the Bag-of-Paths Group Betweenness
Lebichot, Bertrand, Kivimäki, Ilkka, Françoisse, Kevin, Saerens, Marco
This paper introduces a novel, well-founded, betweenness measure, called the Bag-of-Paths (BoP) betweenness, as well as its extension, the BoP group betweenness, to tackle semisupervised classification problems on weighted directed graphs. The objective of semi-supervised classification is to assign a label to unlabeled nodes using the whole topology of the graph and the labeled nodes at our disposal. The BoP betweenness relies on a bag-of-paths framework assigning a Boltzmann distribution on the set of all possible paths through the network such that long (high-cost) paths have a low probability of being picked from the bag, while short (low-cost) paths have a high probability of being picked. Within that context, the BoP betweenness of node j is defined as the sum of the a posteriori probabilities that node j lies in-between two arbitrary nodes i, k, when picking a path starting in i and ending in k. Intuitively, a node typically receives a high betweenness if it has a large probability of appearing on paths connecting two arbitrary nodes of the network. This quantity can be computed in closed form by inverting a n x n matrix where n is the number of nodes. For the group betweenness, the paths are constrained to start and end in nodes within the same class, therefore defining a group betweenness for each class. Unlabeled nodes are then classified according to the class showing the highest group betweenness. Experiments on various real-world data sets show that BoP group betweenness outperforms all the tested state of-the-art methods. The benefit of the BoP betweenness is particularly noticeable when only a few labeled nodes are available.
The Kernel Pitman-Yor Process
Chatzis, Sotirios P., Korkinof, Dimitrios, Demiris, Yiannis
Nonparametric Bayesian modeling techniques, especially Dirichlet process mixture (DPM) models, have become very popular in statistics over the last few years, for performing nonparametric density estimation [1], [2], [3]. This theory is based on the observation that an infinite number of component distributions in an ordinary finite mixture model (clustering model) tends on the limit to a Dirichlet process (DP) prior [2], [4]. Eventually, the nonparametric Bayesian inference scheme induced by a DPM model yields a posterior distribution on the proper number of model component densities (inferred clusters) [5], rather than selecting a fixed number of mixture components. Hence, the obtained nonparametric Bayesian formulation eliminates the need of doing inference (or making arbitrary choices) on the number of mixture components (clusters) necessary to represent the modeled data. An interesting alternative to the Dirichlet process prior for nonparametric Bayesian modeling is the Pitman-Yor process (PYP) prior [6]. Pitman-Yor processes produce power-law distributions that allow for better modeling populations comprising a high number of clusters with low popularity and a low number of clusters with high popularity [7]. Indeed, the Pitman-Yor process prior can be viewed as a generalization of the Dirichlet process prior, and reduces to it for a specific selection of its parameter values. In [8], a Gaussian process-based coupled PYP method for joint segmentation of multiple images is proposed.
Distributed Problem Solving
Yeoh, William (Singapore Management University) | Yokoo, Makoto (Kyushu University)
Distributed problem solving is a subfield within multiagent systems, where agents are assumed to be part of a team and collaborate with each other to reach a common goal. In this article, we illustrate the motivations for distributed problem solving and provide an overview of two distributed problem solving models, namely distributed constraint satisfaction problems (DCSPs) and distributed constraint optimization problems (DCOPs), and some of their algorithms.
Adaptive sequential Monte Carlo by means of mixture of experts
Cornebise, J., Moulines, E., Olsson, J.
Appropriately designing the proposal kernel of particle filters is an issue of significant importance, since a bad choice may lead to deterioration of the particle sample and, consequently, waste of computational power. In this paper we introduce a novel algorithm adaptively approximating the so-called optimal proposal kernel by a mixture of integrated curved exponential distributions with logistic weights. This family of distributions, referred to as mixtures of experts, is broad enough to be used in the presence of multi-modality or strongly skewed distributions. The mixtures are fitted, via online-EM methods, to the optimal kernel through minimisation of the Kullback-Leibler divergence between the auxiliary target and instrumental distributions of the particle filter. At each iteration of the particle filter, the algorithm is required to solve only a single optimisation problem for the whole particle sample, yielding an algorithm with only linear complexity. In addition, we illustrate in a simulation study how the method can be successfully applied to optimal filtering in nonlinear state-space models.
Game-Based Data Capture for Player Metrics
Normoyle, Aline (University of Pennsylvania) | Drake, John (University of Pennsylvania) | Likhachev, Maxim (Carnegie Mellon University) | Safonova, Alla (Disney Research Pittsburgh)
Player metrics are an invaluable resource for game designers and QA analysts who wish to understand players, monitor and improve game play, and test design hypotheses. Usually such metrics are collected in a straightforward manner by passively recording players; however, such an approach has several potential drawbacks. First, passive recording might fail to record metrics which correspond to an infrequent player behavior. Secondly, passive recording can be a costly, laborious, and memory intensive process, even with the aid of tools. In this paper, we explore the potential for an active approach to player metric collection which strives to collect data more efficiently, and thus with less cost. We use an online, iterative approach which models the relationship between player metrics and in-game situations probabilistically using a Markov Decision Process (MDP) and solves it for the best game configurations to run. To analyze the benefits and limitations of this approach, we implemented a system, called GAMELAB, for recording player metrics in Second Life.