Goto

Collaborating Authors

 Asia


Mixture Approximations to Bayesian Networks

arXiv.org Artificial Intelligence

Structure and parameters in a Bayesian network uniquely specify the probability distribution of the modeled domain. The locality of both structure and probabilistic information are the great benefits of Bayesian networks and require the modeler to only specify local information. On the other hand this locality of information might prevent the modeler - and even more any other person - from obtaining a general overview of the important relationships within the domain. The goal of the work presented in this paper is to provide an "alternative" view on the knowledge encoded in a Bayesian network which might sometimes be very helpful for providing insights into the underlying domain. The basic idea is to calculate a mixture approximation to the probability distribution represented by the Bayesian network. The mixture component densities can be thought of as representing typical scenarios implied by the Bayesian model, providing intuition about the basic relationships. As an additional benefit, performing inference in the approximate model is very simple and intuitive and can provide additional insights. The computational complexity for the calculation of the mixture approximations criticaly depends on the measure which defines the distance between the probability distribution represented by the Bayesian network and the approximate distribution. Both the KL-divergence and the backward KL-divergence lead to inefficient algorithms. Incidentally, the latter is used in recent work on mixtures of mean field solutions to which the work presented here is closely related. We show, however, that using a mean squared error cost function leads to update equations which can be solved using the junction tree algorithm. We conclude that the mean squared error cost function can be used for Bayesian networks in which inference based on the junction tree is tractable. For large networks, however, one may have to rely on mean field approximations.


Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm

arXiv.org Artificial Intelligence

Learning Bayesian networks is often cast as an optimization problem, where the computational task is to find a structure that maximizes a statistically motivated score. By and large, existing learning tools address this optimization problem using standard heuristic search techniques. Since the search space is extremely large, such search procedures can spend most of the time examining candidates that are extremely unreasonable. This problem becomes critical when we deal with data sets that are large either in the number of instances, or the number of attributes. In this paper, we introduce an algorithm that achieves faster learning by restricting the search space. This iterative algorithm restricts the parents of each variable to belong to a small subset of candidates. We then search for a network that satisfies these constraints. The learned network is then used for selecting better candidates for the next iteration. We evaluate this algorithm both on synthetic and real-life data. Our results show that it is significantly faster than alternative search procedures without loss of quality in the learned structures.


Discovering the Hidden Structure of Complex Dynamic Systems

arXiv.org Artificial Intelligence

Dynamic Bayesian networks provide a compact and natural representation for complex dynamic systems. However, in many cases, there is no expert available from whom a model can be elicited. Learning provides an alternative approach for constructing models of dynamic systems. In this paper, we address some of the crucial computational aspects of learning the structure of dynamic systems, particularly those where some relevant variables are partially observed or even entirely unknown. Our approach is based on the Structural Expectation Maximization (SEM) algorithm. The main computational cost of the SEM algorithm is the gathering of expected sufficient statistics. We propose a novel approximation scheme that allows these sufficient statistics to be computed efficiently. We also investigate the fundamental problem of discovering the existence of hidden variables without exhaustive and expensive search. Our approach is based on the observation that, in dynamic systems, ignoring a hidden variable typically results in a violation of the Markov property. Thus, our algorithm searches for such violations in the data, and introduces hidden variables to explain them. We provide empirical results showing that the algorithm is able to learn the dynamics of complex systems in a computationally tractable way.


Financial Portfolio Optimization: Computationally guided agents to investigate, analyse and invest!?

arXiv.org Machine Learning

Financial portfolio optimization is a widely studied problem in mathematics, statistics, financial and computational literature. It adheres to determining an optimal combination of weights associated with financial assets held in a portfolio. In practice, it faces challenges by virtue of varying math. formulations, parameters, business constraints and complex financial instruments. Empirical nature of data is no longer one-sided; thereby reflecting upside and downside trends with repeated yet unidentifiable cyclic behaviours potentially caused due to high frequency volatile movements in asset trades. Portfolio optimization under such circumstances is theoretically and computationally challenging. This work presents a novel mechanism to reach an optimal solution by encoding a variety of optimal solutions in a solution bank to guide the search process for the global investment objective formulation. It conceptualizes the role of individual solver agents that contribute optimal solutions to a bank of solutions, a super-agent solver that learns from the solution bank, and, thus reflects a knowledge-based computationally guided agents approach to investigate, analyse and reach to optimal solution for informed investment decisions. Conceptual understanding of classes of solver agents that represent varying problem formulations and, mathematically oriented deterministic solvers along with stochastic-search driven evolutionary and swarm-intelligence based techniques for optimal weights are discussed. Algorithmic implementation is presented by an enhanced neighbourhood generation mechanism in Simulated Annealing algorithm. A framework for inclusion of heuristic knowledge and human expertise from financial literature related to investment decision making process is reflected via introduction of controlled perturbation strategies using a decision matrix for neighbourhood generation.


A Spectral Algorithm for Latent Dirichlet Allocation

arXiv.org Machine Learning

The problem of topic modeling can be seen as a generalization of the clustering problem, in that it posits that observations are generated due to multiple latent factors (e.g., the words in each document are generated as a mixture of several active topics, as opposed to just one). This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topic probability vectors (the distributions over words for each topic), when only the words are observed and the corresponding topics are hidden. We provide a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of mixture models, including the popular latent Dirichlet allocation (LDA) model. For LDA, the procedure correctly recovers both the topic probability vectors and the prior over the topics, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method, termed Excess Correlation Analysis (ECA), is based on a spectral decomposition of low order moments (third and fourth order) via two singular value decompositions (SVDs). Moreover, the algorithm is scalable since the SVD operations are carried out on $k\times k$ matrices, where $k$ is the number of latent factors (e.g. the number of topics), rather than in the $d$-dimensional observed space (typically $d \gg k$).


Change-Point Detection in Time-Series Data by Relative Density-Ratio Estimation

arXiv.org Machine Learning

The objective of change-point detection is to discover abrupt property changes lying behind time-series data. In this paper, we present a novel statistical change-point detection algorithm based on non-parametric divergence estimation between time-series samples from two retrospective segments. Our method uses the relative Pearson divergence as a divergence measure, and it is accurately and efficiently estimated by a method of direct density-ratio estimation. Through experiments on artificial and real-world datasets including human-activity sensing, speech, and Twitter messages, we demonstrate the usefulness of the proposed method.


Monte Carlo Inference via Greedy Importance Sampling

arXiv.org Machine Learning

We present a new method for conducting Monte Carlo inference in graphical models which combines explicit search with generalized importance sampling. The idea is to reduce the variance of importance sampling by searching for significant points in the target distribution. We prove that it is possible to introduce search and still maintain unbiasedness. We then demonstrate our procedure on a few simple inference tasks and show that it can improve the inference quality of standard MCMC methods, including Gibbs sampling, Metropolis sampling, and Hybrid Monte Carlo. This paper extends previous work which showed how greedy importance sampling could be correctly realized in the one-dimensional case.


Being Bayesian about Network Structure

arXiv.org Machine Learning

In many domains, we are interested in analyzing the structure of the underlying distribution, e.g., whether one variable is a direct parent of the other. Bayesian model-selection attempts to find the MAP model and use its structure to answer these questions. However, when the amount of available data is modest, there might be many models that have non-negligible posterior. Thus, we want compute the Bayesian posterior of a feature, i.e., the total posterior probability of all models that contain it. In this paper, we propose a new approach for this task. We first show how to efficiently compute a sum over the exponential number of networks that are consistent with a fixed ordering over network variables. This allows us to compute, for a given ordering, both the marginal probability of the data and the posterior of a feature. We then use this result as the basis for an algorithm that approximates the Bayesian posterior of a feature. Our approach uses a Markov Chain Monte Carlo (MCMC) method, but over orderings rather than over network structures. The space of orderings is much smaller and more regular than the space of structures, and has a smoother posterior `landscape'. We present empirical results on synthetic and real-life datasets that compare our approach to full model averaging (when possible), to MCMC over network structures, and to a non-Bayesian bootstrap approach.


Gaussian Process Networks

arXiv.org Machine Learning

In this paper we address the problem of learning the structure of a Bayesian network in domains with continuous variables. This task requires a procedure for comparing different candidate structures. In the Bayesian framework, this is done by evaluating the {em marginal likelihood/} of the data given a candidate structure. This term can be computed in closed-form for standard parametric families (e.g., Gaussians), and can be approximated, at some computational cost, for some semi-parametric families (e.g., mixtures of Gaussians). We present a new family of continuous variable probabilistic networks that are based on {em Gaussian Process/} priors. These priors are semi-parametric in nature and can learn almost arbitrary noisy functional relations. Using these priors, we can directly compute marginal likelihoods for structure learning. The resulting method can discover a wide range of functional dependencies in multivariate data. We develop the Bayesian score of Gaussian Process Networks and describe how to learn them from data. We present empirical results on artificial data as well as on real-life domains with non-linear dependencies.


Dynamic Bayesian Multinets

arXiv.org Machine Learning

In this work, dynamic Bayesian multinets are introduced where a Markov chain state at time t determines conditional independence patterns between random variables lying within a local time window surrounding t. It is shown how information-theoretic criterion functions can be used to induce sparse, discriminative, and class-conditional network structures that yield an optimal approximation to the class posterior probability, and therefore are useful for the classification task. Using a new structure learning heuristic, the resulting models are tested on a medium-vocabulary isolated-word speech recognition task. It is demonstrated that these discriminatively structured dynamic Bayesian multinets, when trained in a maximum likelihood setting using EM, can outperform both HMMs and other dynamic Bayesian networks with a similar number of parameters.