Bayesian Inference
The Author-Topic Model for Authors and Documents
Rosen-Zvi, Michal, Griffiths, Thomas, Steyvers, Mark, Smyth, Padhraic
We intro duce the author-topic mo del, a generative mo del for do cuments that extends Latent Dirichlet Allo cation (LDA; Blei, Ng, & Jordan, 2003) to include authorship information. Each author is asso ciated with a multinomial distribution over topics and each topic is asso ciated with a multinomial distribution over words. A do cument with multiple authors is mo deled as a distribution over topics that is a mixture of the distributions asso ci-ated with the authors. We apply the mo del to a collection of 1,700 NIPS conference pap ers and 160,000 CiteSeer abstracts. Exact inference is intractable for these datasets and we use Gibbs sampling to estimate the topic and author distributions. We compare the p erformance with two other generative mo d-els for do cuments, which are sp ecial cases of the author-topic mo del: LDA (a topic mo del) and a simple author mo del in which each author is asso ciated with a distribution over words rather than a distribution over topics. We show topics recovered by the author-topic mo del, and demonstrate applications to computing similarity b etween authors and entropy of author output.
A Bayesian Approach toward Active Learning for Collaborative Filtering
Collaborative filtering is a useful technique for exploiting the preference patterns of a group of users to predict the utility of items for the active user. In general, the performance of collaborative filtering depends on the number of rated examples given by the active user. The more the number of rated examples given by the active user, the more accurate the predicted ratings will be. Active learning provides an effective way to acquire the most informative rated examples from active users. Previous work on active learning for collaborative filtering only considers the expected loss function based on the estimated model, which can be misleading when the estimated model is inaccurate. This paper takes one step further by taking into account of the posterior distribution of the estimated model, which results in more robust active learning algorithm. Empirical studies with datasets of movie ratings show that when the number of ratings from the active user is restricted to be small, active learning methods only based on the estimated model don't perform well while the active learning method using the model distribution achieves substantially better performance.
A Generative Bayesian Model for Aggregating Experts' Probabilities
In order to improve forecasts, a decisionmaker often combines probabilities given by various sources, such as human experts and machine learning classifiers. When few training data are available, aggregation can be improved by incorporating prior knowledge about the event being forecasted and about salient properties of the experts. To this end, we develop a generative Bayesian aggregation model for probabilistic classi cation. The model includes an event-specific prior, measures of individual experts' bias, calibration, accuracy, and a measure of dependence betweeen experts. Rather than require absolute measures, we show that aggregation may be expressed in terms of relative accuracy between experts. The model results in a weighted logarithmic opinion pool (LogOps) that satis es consistency criteria such as the external Bayesian property. We derive analytic solutions for independent and for exchangeable experts. Empirical tests demonstrate the model's use, comparing its accuracy with other aggregation methods.
Bayesian Learning in Undirected Graphical Models: Approximate MCMC algorithms
Murray, Iain, Ghahramani, Zoubin
Bayesian learning in undirected graphical models--computing posterior distributions over parameters and predictive quantities-- is exceptionally difficult. We conjecture that for general undirected models, there are no tractable MCMC (Markov Chain Monte Carlo) schemes giving the correct equilibrium distribution over parameters. While this intractability, due to the partition function, is familiar to those performing parameter optimisation, Bayesian learning of posterior distributions over undirected model parameters has been unexplored and poses novel challenges. We propose several approximate MCMC schemes and test on fully observed binary models (Boltzmann machines) for a small coronary heart disease data set and larger artificial systems. While approximations must perform well on the model, their interaction with the sampling scheme is also important. Samplers based on variational mean-field approximations generally performed poorly, more advanced methods using loopy propagation, brief sampling and stochastic dynamics lead to acceptable parameter posteriors. Finally, we demonstrate these techniques on a Markov random field with hidden variables.
Applying Discrete PCA in Data Analysis
Buntine, Wray L., Jakulin, Aleks
Methods for analysis of principal components in discrete data have existed for some time under various names such as grade of membership modelling, probabilistic latent semantic analysis, and genotype inference with admixture. In this paper we explore a number of extensions to the common theory, and present some application of these methods to some common statistical tasks. We show that these methods can be interpreted as a discrete version of ICA. We develop a hierarchical version yielding components at different levels of detail, and additional techniques for Gibbs sampling. We compare the algorithms on a text prediction task using support vector machines, and to information retrieval.
Algebraic Statistics in Model Selection
We develop the necessary theory in computational algebraic geometry to place Bayesian networks into the realm of algebraic statistics. We present an algebra{statistics dictionary focused on statistical modeling. In particular, we link the notion of effiective dimension of a Bayesian network with the notion of algebraic dimension of a variety. We also obtain the independence and non{independence constraints on the distributions over the observable variables implied by a Bayesian network with hidden variables, via a generating set of an ideal of polynomials associated to the network. These results extend previous work on the subject. Finally, the relevance of these results for model selection is discussed.
The Minimum Information Principle for Discriminative Learning
Globerson, Amir, Tishby, Naftali
Exponential models of distributions are widely used in machine learning for classiffication and modelling. It is well known that they can be interpreted as maximum entropy models under empirical expectation constraints. In this work, we argue that for classiffication tasks, mutual information is a more suitable information theoretic measure to be optimized. We show how the principle of minimum mutual information generalizes that of maximum entropy, and provides a comprehensive framework for building discriminative classiffiers. A game theoretic interpretation of our approach is then given, and several generalization bounds provided. We present iterative algorithms for solving the minimum information problem and its convex dual, and demonstrate their performance on various classiffication tasks. The results show that minimum information classiffiers outperform the corresponding maximum entropy models.
A New Characterization of Probabilities in Bayesian Networks
We characterize probabilities in Bayesian networks in terms of algebraic expressions called quasi-probabilities. These are arrived at by casting Bayesian networks as noisy AND-OR-NOT networks, and viewing the subnetworks that lead to a node as arguments for or against a node. Quasi-probabilities are in a sense the "natural" algebra of Bayesian networks: we can easily compute the marginal quasi-probability of any node recursively, in a compact form; and we can obtain the joint quasi-probability of any set of nodes by multiplying their marginals (using an idempotent product operator). Quasi-probabilities are easily manipulated to improve the efficiency of probabilistic inference. They also turn out to be representable as square-wave pulse trains, and joint and marginal distributions can be computed by multiplication and complementation of pulse trains.
Monotonicity in Bayesian Networks
van der Gaag, Linda C., Bodlaender, Hans L., Feelders, Ad
For many real-life Bayesian networks, common knowledge dictates that the output established for the main variable of interest increases with higher values for the observable variables. We define two concepts of monotonicity to capture this type of knowledge. We say that a network is isotone in distribution if the probability distribution computed for the output variable given specific observations is stochastically dominated by any such distribution given higher-ordered observations; a network is isotone in mode if a probability distribution given higher observations has a higher mode. We show that establishing whether a network exhibits any of these properties of monotonicity is coNPPP-complete in general, and remains coNP-complete for polytrees. We present an approximate algorithm for deciding whether a network is monotone in distribution and illustrate its application to a real-life network in oncology.
Annealed MAP
Yuan, Changhe, Lu, Tsai-Ching, Druzdzel, Marek J.
Maximum a Posteriori assignment (MAP) is the problem of finding the most probable instantiation of a set of variables given the partial evidence on the other variables in a Bayesian network. MAP has been shown to be a NP-hard problem [22], even for constrained networks, such as polytrees [18]. Hence, previous approaches often fail to yield any results for MAP problems in large complex Bayesian networks. To address this problem, we propose AnnealedMAP algorithm, a simulated annealing-based MAP algorithm. The AnnealedMAP algorithm simulates a non-homogeneous Markov chain whose invariant function is a probability density that concentrates itself on the modes of the target density. We tested this algorithm on several real Bayesian networks. The results show that, while maintaining good quality of the MAP solutions, the AnnealedMAP algorithm is also able to solve many problems that are beyond the reach of previous approaches.