Bayesian Inference
Infinitely exchangeable random graphs generated from a Poisson point process on monotone sets and applications to cluster analysis for networks
We construct an infinitely exchangeable process on the set $\cate$ of subsets of the power set of the natural numbers $\mathbb{N}$ via a Poisson point process with mean measure $\Lambda$ on the power set of $\mathbb{N}$. Each $E\in\cate$ has a least monotone cover in $\catf$, the collection of monotone subsets of $\cate$, and every monotone subset maps to an undirected graph $G\in\catg$, the space of undirected graphs with vertex set $\mathbb{N}$. We show a natural mapping $\cate\rightarrow\catf\rightarrow\catg$ which induces an infinitely exchangeable measure on the projective system $\catg^{\rest}$ of graphs $\catg$ under permutation and restriction mappings given an infinitely exchangeable family of measures on the projective system $\cate^{\rest}$ of subsets with permutation and restriction maps. We show potential connections of this process to applications in cluster analysis, machine learning, classification and Bayesian inference.
Inferring Networks of Diffusion and Influence
Gomez-Rodriguez, Manuel, Leskovec, Jure, Krause, Andreas
Information diffusion and virus propagation are fundamental processes taking place in networks. While it is often possible to directly observe when nodes become infected with a virus or adopt the information, observing individual transmissions (i.e., who infects whom, or who influences whom) is typically very difficult. Furthermore, in many applications, the underlying network over which the diffusions and propagations spread is actually unobserved. We tackle these challenges by developing a method for tracing paths of diffusion and influence through networks and inferring the networks over which contagions propagate. Given the times when nodes adopt pieces of information or become infected, we identify the optimal network that best explains the observed infection times. Since the optimization problem is NP-hard to solve exactly, we develop an efficient approximation algorithm that scales to large datasets and finds provably near-optimal networks. We demonstrate the effectiveness of our approach by tracing information diffusion in a set of 170 million blogs and news articles over a one year period to infer how information flows through the online media space. We find that the diffusion network of news for the top 1,000 media sites and blogs tends to have a core-periphery structure with a small set of core media sites that diffuse information to the rest of the Web. These sites tend to have stable circles of influence with more general news media sites acting as connectors between them.
Convergence rates of efficient global optimization algorithms
Efficient global optimization is the problem of minimizing an unknown function f, using as few evaluations f(x) as possible. It can be considered as a continuum-armed bandit problem, with noiseless data and simple regret. Expected improvement is perhaps the most popular method for solving this problem; the algorithm performs well in experiments, but little is known about its theoretical properties. Implementing expected improvement requires a choice of Gaussian process prior, which determines an associated space of functions, its reproducing-kernel Hilbert space (RKHS). When the prior is fixed, expected improvement is known to converge on the minimum of any function in the RKHS. We begin by providing convergence rates for this procedure. The rates are optimal for functions of low smoothness, and we modify the algorithm to attain optimal rates for smoother functions. For practitioners, however, these results are somewhat misleading. Priors are typically not held fixed, but depend on parameters estimated from the data. For standard estimators, we show this procedure may never discover the minimum of f. We then propose alternative estimators, chosen to minimize the constants in the rate of convergence, and show these estimators retain the convergence rates of a fixed prior.
Kernel Topic Models
Hennig, Philipp, Stern, David, Herbrich, Ralf, Graepel, Thore
We study a variation of this concept, in which the documents' mixture weight beliefs are replaced with squashed Gaussian distributions. This allows documents to be associated with elements of a Hilbert space, admitting kernel topic models (KTM), modelling temporal, spatial, hierarchical, social and other structure between documents. The main challenge is efficient approximate inference on the latent Gaussian. We present an approximate algorithm cast around a Laplace approximation in a transformed basis. The KTM can also be interpreted as a type of Gaussian process latent variable model, or as a topic model conditional on document features, uncovering links between earlier work in these areas.
Gaussian Process Regression Networks
Wilson, Andrew Gordon, Knowles, David A., Ghahramani, Zoubin
We introduce a new regression framework, Gaussian process regression networks (GPRN), which combines the structural properties of Bayesian neural networks with the non-parametric flexibility of Gaussian processes. This model accommodates input dependent signal and noise correlations between multiple response variables, input dependent length-scales and amplitudes, and heavy-tailed predictive distributions. We derive both efficient Markov chain Monte Carlo and variational Bayes inference procedures for this model. We apply GPRN as a multiple output regression and multivariate volatility model, demonstrating substantially improved performance over eight popular multiple output (multi-task) Gaussian process models and three multivariate volatility models on benchmark datasets, including a 1000 dimensional gene expression dataset.
Bayesian Group Factor Analysis
Virtanen, Seppo, Klami, Arto, Khan, Suleiman A., Kaski, Samuel
We introduce a factor analysis model that summarizes the dependencies between observed variable groups, instead of dependencies between individual variables as standard factor analysis does. A group may correspond to one view of the same set of objects, one of many data sets tied by co-occurrence, or a set of alternative variables collected from statistics tables to measure one property of interest. We show that by assuming group-wise sparse factors, active in a subset of the sets, the variation can be decomposed into factors explaining relationships between the sets and factors explaining away set-specific variation. We formulate the assumptions in a Bayesian model which provides the factors, and apply the model to two data analysis tasks, in neuroimaging and chemical systems biology.
Discovering Emerging Topics in Social Streams via Link Anomaly Detection
Takahashi, Toshimitsu, Tomioka, Ryota, Yamanishi, Kenji
Detection of emerging topics are now receiving renewed interest motivated by the rapid growth of social networks. Conventional term-frequency-based approaches may not be appropriate in this context, because the information exchanged are not only texts but also images, URLs, and videos. We focus on the social aspects of theses networks. That is, the links between users that are generated dynamically intentionally or unintentionally through replies, mentions, and retweets. We propose a probability model of the mentioning behaviour of a social network user, and propose to detect the emergence of a new topic from the anomaly measured through the model. We combine the proposed mention anomaly score with a recently proposed change-point detection technique based on the Sequentially Discounting Normalized Maximum Likelihood (SDNML), or with Kleinberg's burst model. Aggregating anomaly scores from hundreds of users, we show that we can detect emerging topics only based on the reply/mention relationships in social network posts. We demonstrate our technique in a number of real data sets we gathered from Twitter. The experiments show that the proposed mention-anomaly-based approaches can detect new topics at least as early as the conventional term-frequency-based approach, and sometimes much earlier when the keyword is ill-defined.
Cutset Sampling for Bayesian Networks
The paper presents a new sampling methodology for Bayesian networks that samples only a subset of variables and applies exact inference to the rest. Cutset sampling is a network structure-exploiting application of the Rao-Blackwellisation principle to sampling in Bayesian networks. It improves convergence by exploiting memory-based inference algorithms. It can also be viewed as an anytime approximation of the exact cutset-conditioning algorithm developed by Pearl. Cutset sampling can be implemented efficiently when the sampled variables constitute a loop-cutset of the Bayesian network and, more generally, when the induced width of the networks graph conditioned on the observed sampled variables is bounded by a constant w. We demonstrate empirically the benefit of this scheme on a range of benchmarks.
Improving parameter learning of Bayesian nets from incomplete data
Corani, Giorgio, De Campos, Cassio P.
This paper addresses the estimation of parameters of a Bayesian network from incomplete data. The task is usually tackled by running the Expectation-Maximization (EM) algorithm several times in order to obtain a high log-likelihood estimate. We argue that choosing the maximum log-likelihood estimate (as well as the maximum penalized log-likelihood and the maximum a posteriori estimate) has severe drawbacks, being affected both by overfitting and model uncertainty. Two ideas are discussed to overcome these issues: a maximum entropy approach and a Bayesian model averaging approach. Both ideas can be easily applied on top of EM, while the entropy idea can be also implemented in a more sophisticated way, through a dedicated non-linear solver. A vast set of experiments shows that these ideas produce significantly better estimates and inferences than the traditional and widely used maximum (penalized) log-likelihood and maximum a posteriori estimates. In particular, if EM is adopted as optimization engine, the model averaging approach is the best performing one; its performance is matched by the entropy approach when implemented using the non-linear solver. The results suggest that the applicability of these ideas is immediate (they are easy to implement and to integrate in currently available inference engines) and that they constitute a better way to learn Bayesian network parameters.
An Algebraic Graphical Model for Decision with Uncertainties, Feasibilities, and Utilities
Pralet, C., Schiex, T., Verfaillie, G.
Numerous formalisms and dedicated algorithms have been designed in the last decades to model and solve decision making problems. Some formalisms, such as constraint networks, can express "simple" decision problems, while others are designed to take into account uncertainties, unfeasible decisions, and utilities. Even in a single formalism, several variants are often proposed to model different types of uncertainty (probability, possibility...) or utility (additive or not). In this article, we introduce an algebraic graphical model that encompasses a large number of such formalisms: (1) we first adapt previous structures from Friedman, Chu and Halpern for representing uncertainty, utility, and expected utility in order to deal with generic forms of sequential decision making; (2) on these structures, we then introduce composite graphical models that express information via variables linked by "local" functions, thanks to conditional independence; (3) on these graphical models, we finally define a simple class of queries which can represent various scenarios in terms of observabilities and controllabilities. A natural decision-tree semantics for such queries is completed by an equivalent operational semantics, which induces generic algorithms. The proposed framework, called the Plausibility-Feasibility-Utility (PFU) framework, not only provides a better understanding of the links between existing formalisms, but it also covers yet unpublished frameworks (such as possibilistic influence diagrams) and unifies formalisms such as quantified boolean formulas and influence diagrams. Our backtrack and variable elimination generic algorithms are a first step towards unified algorithms.