Bayesian Inference
Query Expansion in Information Retrieval Systems using a Bayesian Network-Based Thesaurus
de Campos, Luis M., Fernandez-Luna, Juan M., Huete, Juan F.
Information Retrieval (IR) is concerned with the identification of documents in a collection that are relevant to a given information need, usually represented as a query containing terms or keywords, which are supposed to be a good description of what the user is looking for. IR systems may improve their effectiveness (i.e., increasing the number of relevant documents retrieved) by using a process of query expansion, which automatically adds new terms to the original query posed by an user. In this paper we develop a method of query expansion based on Bayesian networks. Using a learning algorithm, we construct a Bayesian network that represents some of the relationships among the terms appearing in a given document collection; this network is then used as a thesaurus (specific for that collection). We also report the results obtained by our method on three standard test collections.
A Hybrid Algorithm to Compute Marginal and Joint Beliefs in Bayesian Networks and Its Complexity
Bloemeke, Mark, Valtorta, Marco
There exist two general forms of exact algorithms for updating probabilities in Bayesian Networks. The first approach involves using a structure, usually a clique tree, and performing local message based calculation to extract the belief in each variable. The second general class of algorithm involves the use of non-serial dynamic programming techniques to extract the belief in some desired group of variables. In this paper we present a hybrid algorithm based on the latter approach yet possessing the ability to retrieve the belief in all single variables. The technique is advantageous in that it saves a NP-hard computation step over using one algorithm of each type. Furthermore, this technique re-enforces a conjecture of Jensen and Jensen [JJ94] in that it still requires a single NP-hard step to set up the structure on which inference is performed, as we show by confirming Li and D'Ambrosio's [LD94] conjectured NP-hardness of OFP.
Linear-Nonlinear-Poisson Neuron Networks Perform Bayesian Inference On Boltzmann Machines
One conjecture in both deep learning and classical connectionist viewpoint is that the biological brain implements certain kinds of deep networks as its back-end. However, to our knowledge, a detailed correspondence has not yet been set up, which is important if we want to bridge between neuroscience and machine learning. Recent researches emphasized the biological plausibility of Linear-Nonlinear-Poisson (LNP) neuron model. We show that with neurally plausible settings, the whole network is capable of representing any Boltzmann machine and performing a semi-stochastic Bayesian inference algorithm lying between Gibbs sampling and variational inference.
Transfer Topic Modeling with Ease and Scalability
Kang, Jeon-Hyung, Ma, Jun, Liu, Yan
The increasing volume of short texts generated on social media sites, such as Twitter or Facebook, creates a great demand for effective and efficient topic modeling approaches. While latent Dirichlet allocation (LDA) can be applied, it is not optimal due to its weakness in handling short texts with fast-changing topics and scalability concerns. In this paper, we propose a transfer learning approach that utilizes abundant labeled documents from other domains (such as Yahoo! News or Wikipedia) to improve topic modeling, with better model fitting and result interpretation. Specifically, we develop Transfer Hierarchical LDA (thLDA) model, which incorporates the label information from other domains via informative priors. In addition, we develop a parallel implementation of our model for large-scale applications. We demonstrate the effectiveness of our thLDA model on both a microblogging dataset and standard text collections including AP and RCV1 datasets.
Generalized double Pareto shrinkage
Armagan, Artin, Dunson, David, Lee, Jaeyong
We propose a generalized double Pareto prior for Bayesian shrinkage estimation and inferences in linear models. The prior can be obtained via a scale mixture of Laplace or normal distributions, forming a bridge between the Laplace and Normal-Jeffreys' priors. While it has a spike at zero like the Laplace density, it also has a Student's $t$-like tail behavior. Bayesian computation is straightforward via a simple Gibbs sampling algorithm. We investigate the properties of the maximum a posteriori estimator, as sparse estimation plays an important role in many problems, reveal connections with some well-established regularization procedures, and show some asymptotic results. The performance of the prior is tested through simulations and an application.
Mixture Gaussian Process Conditional Heteroscedasticity
Platanios, Emmanouil A., Chatzis, Sotirios P.
Generalized autoregressive conditional heteroscedasticity (GARCH) models have long been considered as one of the most successful families of approaches for volatility modeling in financial return series. In this paper, we propose an alternative approach based on methodologies widely used in the field of statistical machine learning. Specifically, we propose a novel nonparametric Bayesian mixture of Gaussian process regression models, each component of which models the noise variance process that contaminates the observed data as a separate latent Gaussian process driven by the observed data. This way, we essentially obtain a mixture Gaussian process conditional heteroscedasticity (MGPCH) model for volatility modeling in financial return series. We impose a nonparametric prior with power-law nature over the distribution of the model mixture components, namely the Pitman-Yor process prior, to allow for better capturing modeled data distributions with heavy tails and skewness. Finally, we provide a copula- based approach for obtaining a predictive posterior for the covariances over the asset returns modeled by means of a postulated MGPCH model. We evaluate the efficacy of our approach in a number of benchmark scenarios, and compare its performance to state-of-the-art methodologies.
Model-Based Bayesian Exploration
Dearden, Richard, Friedman, Nir, Andre, David
Reinforcement learning systems are often concerned with balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of Information - the expected improvement in future decision quality arising from the information acquired by exploration. Estimating this quantity requires an assessment of the agent's uncertainty about its current value estimates for states. In this paper we investigate ways of representing and reasoning about this uncertainty in algorithms where the system attempts to learn a model of its environment. We explicitly represent uncertainty about the parameters of the model and build probability distributions over Q-values based on these. These distributions are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation.
Mixture Approximations to Bayesian Networks
Tresp, Volker, Haft, Michael, Hofmann, Reimar
Structure and parameters in a Bayesian network uniquely specify the probability distribution of the modeled domain. The locality of both structure and probabilistic information are the great benefits of Bayesian networks and require the modeler to only specify local information. On the other hand this locality of information might prevent the modeler - and even more any other person - from obtaining a general overview of the important relationships within the domain. The goal of the work presented in this paper is to provide an "alternative" view on the knowledge encoded in a Bayesian network which might sometimes be very helpful for providing insights into the underlying domain. The basic idea is to calculate a mixture approximation to the probability distribution represented by the Bayesian network. The mixture component densities can be thought of as representing typical scenarios implied by the Bayesian model, providing intuition about the basic relationships. As an additional benefit, performing inference in the approximate model is very simple and intuitive and can provide additional insights. The computational complexity for the calculation of the mixture approximations criticaly depends on the measure which defines the distance between the probability distribution represented by the Bayesian network and the approximate distribution. Both the KL-divergence and the backward KL-divergence lead to inefficient algorithms. Incidentally, the latter is used in recent work on mixtures of mean field solutions to which the work presented here is closely related. We show, however, that using a mean squared error cost function leads to update equations which can be solved using the junction tree algorithm. We conclude that the mean squared error cost function can be used for Bayesian networks in which inference based on the junction tree is tractable. For large networks, however, one may have to rely on mean field approximations.
On Supervised Selection of Bayesian Networks
Kontkanen, Petri, Myllymaki, Petri, Silander, Tomi, Tirri, Henry
Given a set of possible models (e.g., Bayesian network structures) and a data sample, in the unsupervised model selection problem the task is to choose the most accurate model with respect to the domain joint probability distribution. In contrast to this, in supervised model selection it is a priori known that the chosen model will be used in the future for prediction tasks involving more "focused" predictive distributions. Although focused predictive distributions can be produced from the joint probability distribution by marginalization, in practice the best model in the unsupervised sense does not necessarily perform well in supervised domains. In particular, the standard marginal likelihood score is a criterion for the unsupervised task, and, although frequently used for supervised model selection also, does not perform well in such tasks. In this paper we study the performance of the marginal likelihood score empirically in supervised Bayesian network selection tasks by using a large number of publicly available classification data sets, and compare the results to those obtained by alternative model selection criteria, including empirical crossvalidation methods, an approximation of a supervised marginal likelihood measure, and a supervised version of Dawid's prequential (predictive sequential) principle. The results demonstrate that the marginal likelihood score does not perform well for supervised model selection, while the best results are obtained by using Dawid's prequential approach.
Approximate Learning in Complex Dynamic Bayesian Networks
Settimi, Raffaella, Smith, Jim Q., Gargoum, A. S.
In this paper we extend the work of Smith and Papamichail (1999) and present fast approximate Bayesian algorithms for learning in complex scenarios where at any time frame, the relationships between explanatory state space variables can be described by a Bayesian network that evolve dynamically over time and the observations taken are not necessarily Gaussian. It uses recent developments in approximate Bayesian forecasting methods in combination with more familiar Gaussian propagation algorithms on junction trees. The procedure for learning state parameters from data is given explicitly for common sampling distributions and the methodology is illustrated through a real application. The efficiency of the dynamic approximation is explored by using the Hellinger divergence measure and theoretical bounds for the efficacy of such a procedure are discussed.