AITopics | Asia

Collaborating Authors

Asia

Mixture Approximations to Bayesian Networks

Tresp, Volker, Haft, Michael, Hofmann, Reimar

arXiv.org Artificial IntelligenceJan-23-2013

Structure and parameters in a Bayesian network uniquely specify the probability distribution of the modeled domain. The locality of both structure and probabilistic information are the great benefits of Bayesian networks and require the modeler to only specify local information. On the other hand this locality of information might prevent the modeler - and even more any other person - from obtaining a general overview of the important relationships within the domain. The goal of the work presented in this paper is to provide an "alternative" view on the knowledge encoded in a Bayesian network which might sometimes be very helpful for providing insights into the underlying domain. The basic idea is to calculate a mixture approximation to the probability distribution represented by the Bayesian network. The mixture component densities can be thought of as representing typical scenarios implied by the Bayesian model, providing intuition about the basic relationships. As an additional benefit, performing inference in the approximate model is very simple and intuitive and can provide additional insights. The computational complexity for the calculation of the mixture approximations criticaly depends on the measure which defines the distance between the probability distribution represented by the Bayesian network and the approximate distribution. Both the KL-divergence and the backward KL-divergence lead to inefficient algorithms. Incidentally, the latter is used in recent work on mixtures of mean field solutions to which the work presented here is closely related. We show, however, that using a mean squared error cost function leads to update equations which can be solved using the junction tree algorithm. We conclude that the mean squared error cost function can be used for Bayesian networks in which inference based on the junction tree is tractable. For large networks, however, one may have to rely on mean field approximations.

approximation, bayesian network, probability, (15 more...)

arXiv.org Artificial Intelligence

1301.6744

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.06)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm

Friedman, Nir, Nachman, Iftach, Pe'er, Dana

arXiv.org Artificial IntelligenceJan-23-2013

Learning Bayesian networks is often cast as an optimization problem, where the computational task is to find a structure that maximizes a statistically motivated score. By and large, existing learning tools address this optimization problem using standard heuristic search techniques. Since the search space is extremely large, such search procedures can spend most of the time examining candidates that are extremely unreasonable. This problem becomes critical when we deal with data sets that are large either in the number of instances, or the number of attributes. In this paper, we introduce an algorithm that achieves faster learning by restricting the search space. This iterative algorithm restricts the parents of each variable to belong to a small subset of candidates. We then search for a network that satisfies these constraints. The learned network is then used for selecting better candidates for the next iteration. We evaluate this algorithm both on synthetic and real-life data. Our results show that it is significantly faster than alternative search procedures without loss of quality in the learned structures.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1301.6696

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Discovering the Hidden Structure of Complex Dynamic Systems

Boyen, Xavier, Friedman, Nir, Koller, Daphne

arXiv.org Artificial IntelligenceJan-23-2013

Dynamic Bayesian networks provide a compact and natural representation for complex dynamic systems. However, in many cases, there is no expert available from whom a model can be elicited. Learning provides an alternative approach for constructing models of dynamic systems. In this paper, we address some of the crucial computational aspects of learning the structure of dynamic systems, particularly those where some relevant variables are partially observed or even entirely unknown. Our approach is based on the Structural Expectation Maximization (SEM) algorithm. The main computational cost of the SEM algorithm is the gathering of expected sufficient statistics. We propose a novel approximation scheme that allows these sufficient statistics to be computed efficiently. We also investigate the fundamental problem of discovering the existence of hidden variables without exhaustive and expensive search. Our approach is based on the observation that, in dynamic systems, ignoring a hidden variable typically results in a violation of the Markov property. Thus, our algorithm searches for such violations in the data, and introduces hidden variables to explain them. We provide empirical results showing that the algorithm is able to learn the dynamics of complex systems in a computationally tractable way.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Artificial Intelligence

1301.6683

Country: Asia > Middle East (0.29)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Financial Portfolio Optimization: Computationally guided agents to investigate, analyse and invest!?

Dangi, Ankit

arXiv.org Machine LearningJan-17-2013

Financial portfolio optimization is a widely studied problem in mathematics, statistics, financial and computational literature. It adheres to determining an optimal combination of weights associated with financial assets held in a portfolio. In practice, it faces challenges by virtue of varying math. formulations, parameters, business constraints and complex financial instruments. Empirical nature of data is no longer one-sided; thereby reflecting upside and downside trends with repeated yet unidentifiable cyclic behaviours potentially caused due to high frequency volatile movements in asset trades. Portfolio optimization under such circumstances is theoretically and computationally challenging. This work presents a novel mechanism to reach an optimal solution by encoding a variety of optimal solutions in a solution bank to guide the search process for the global investment objective formulation. It conceptualizes the role of individual solver agents that contribute optimal solutions to a bank of solutions, a super-agent solver that learns from the solution bank, and, thus reflects a knowledge-based computationally guided agents approach to investigate, analyse and reach to optimal solution for informed investment decisions. Conceptual understanding of classes of solver agents that represent varying problem formulations and, mathematically oriented deterministic solvers along with stochastic-search driven evolutionary and swarm-intelligence based techniques for optimal weights are discussed. Algorithmic implementation is presented by an enhanced neighbourhood generation mechanism in Simulated Annealing algorithm. A framework for inclusion of heuristic knowledge and human expertise from financial literature related to investment decision making process is reflected via introduction of controlled perturbation strategies using a decision matrix for neighbourhood generation.

constraint, optimization problem, upstream oil & gas, (16 more...)

arXiv.org Machine Learning

1301.4194

Country:

Europe > Germany (0.45)
Asia > India (0.14)
Europe > United Kingdom > England (0.14)

Genre:

Research Report (1.00)
Overview (0.93)

Industry:

Energy > Oil & Gas > Upstream (1.00)
Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

A Spectral Algorithm for Latent Dirichlet Allocation

Anandkumar, Animashree, Foster, Dean P., Hsu, Daniel, Kakade, Sham M., Liu, Yi-Kai

arXiv.org Machine LearningJan-17-2013

The problem of topic modeling can be seen as a generalization of the clustering problem, in that it posits that observations are generated due to multiple latent factors (e.g., the words in each document are generated as a mixture of several active topics, as opposed to just one). This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topic probability vectors (the distributions over words for each topic), when only the words are observed and the corresponding topics are hidden. We provide a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of mixture models, including the popular latent Dirichlet allocation (LDA) model. For LDA, the procedure correctly recovers both the topic probability vectors and the prior over the topics, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method, termed Excess Correlation Analysis (ECA), is based on a spectral decomposition of low order moments (third and fourth order) via two singular value decompositions (SVDs). Moreover, the algorithm is scalable since the SVD operations are carried out on $k\times k$ matrices, where $k$ is the number of latent factors (e.g. the number of topics), rather than in the $d$-dimensional observed space (typically $d \gg k$).

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

1204.6703

Country:

Asia (1.00)
North America > United States > California (0.28)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Sports (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Law (0.92)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.84)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Change-Point Detection in Time-Series Data by Relative Density-Ratio Estimation

Liu, Song, Yamada, Makoto, Collier, Nigel, Sugiyama, Masashi

arXiv.org Machine LearningJan-16-2013

The objective of change-point detection is to discover abrupt property changes lying behind time-series data. In this paper, we present a novel statistical change-point detection algorithm based on non-parametric divergence estimation between time-series samples from two retrospective segments. Our method uses the relative Pearson divergence as a divergence measure, and it is accurately and efficiently estimated by a method of direct density-ratio estimation. Through experiments on artificial and real-world datasets including human-activity sensing, speech, and Twitter messages, we demonstrate the usefulness of the proposed method.

change-point detection, survey article, upstream oil & gas, (16 more...)

arXiv.org Machine Learning

doi: 10.1016/j.neunet.2013.01.012

1203.0453

Country:

North America > United States (1.00)
Asia > Japan > Honshū (0.28)
Europe > Netherlands (0.28)
(3 more...)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Monte Carlo Inference via Greedy Importance Sampling

Schuurmans, Dale, Southey, Finnegan

arXiv.org Machine LearningJan-16-2013

We present a new method for conducting Monte Carlo inference in graphical models which combines explicit search with generalized importance sampling. The idea is to reduce the variance of importance sampling by searching for significant points in the target distribution. We prove that it is possible to introduce search and still maintain unbiasedness. We then demonstrate our procedure on a few simple inference tasks and show that it can improve the inference quality of standard MCMC methods, including Gibbs sampling, Metropolis sampling, and Hybrid Monte Carlo. This paper extends previous work which showed how greedy importance sampling could be correctly realized in the one-dimensional case.

greedy importance, inference, procedure, (17 more...)

arXiv.org Machine Learning

1301.389

Country:

North America > United States > New York (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Being Bayesian about Network Structure

Friedman, Nir, Koller, Daphne

arXiv.org Machine LearningJan-16-2013

In many domains, we are interested in analyzing the structure of the underlying distribution, e.g., whether one variable is a direct parent of the other. Bayesian model-selection attempts to find the MAP model and use its structure to answer these questions. However, when the amount of available data is modest, there might be many models that have non-negligible posterior. Thus, we want compute the Bayesian posterior of a feature, i.e., the total posterior probability of all models that contain it. In this paper, we propose a new approach for this task. We first show how to efficiently compute a sum over the exponential number of networks that are consistent with a fixed ordering over network variables. This allows us to compute, for a given ordering, both the marginal probability of the data and the posterior of a feature. We then use this result as the basis for an algorithm that approximates the Bayesian posterior of a feature. Our approach uses a Markov Chain Monte Carlo (MCMC) method, but over orderings rather than over network structures. The space of orderings is much smaller and more regular than the space of structures, and has a smoother posterior `landscape'. We present empirical results on synthetic and real-life datasets that compare our approach to full model averaging (when possible), to MCMC over network structures, and to a non-Bayesian bootstrap approach.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1301.3856

Country:

North America > United States (0.46)
Asia > Middle East (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Gaussian Process Networks

Friedman, Nir, Nachman, Iftach

arXiv.org Machine LearningJan-16-2013

In this paper we address the problem of learning the structure of a Bayesian network in domains with continuous variables. This task requires a procedure for comparing different candidate structures. In the Bayesian framework, this is done by evaluating the {em marginal likelihood/} of the data given a candidate structure. This term can be computed in closed-form for standard parametric families (e.g., Gaussians), and can be approximated, at some computational cost, for some semi-parametric families (e.g., mixtures of Gaussians). We present a new family of continuous variable probabilistic networks that are based on {em Gaussian Process/} priors. These priors are semi-parametric in nature and can learn almost arbitrary noisy functional relations. Using these priors, we can directly compute marginal likelihoods for structure learning. The resulting method can discover a wide range of functional dependencies in multivariate data. We develop the Bayesian score of Gaussian Process Networks and describe how to learn them from data. We present empirical results on artificial data as well as on real-life domains with non-linear dependencies.

artificial intelligence, dependency, machine learning, (18 more...)

arXiv.org Machine Learning

1301.3857

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Dynamic Bayesian Multinets

Bilmes, Jeff A.

arXiv.org Machine LearningJan-16-2013

In this work, dynamic Bayesian multinets are introduced where a Markov chain state at time t determines conditional independence patterns between random variables lying within a local time window surrounding t. It is shown how information-theoretic criterion functions can be used to induce sparse, discriminative, and class-conditional network structures that yield an optimal approximation to the class posterior probability, and therefore are useful for the classification task. Using a new structure learning heuristic, the resulting models are tested on a medium-vocabulary isolated-word speech recognition task. It is demonstrated that these discriminatively structured dynamic Bayesian multinets, when trained in a maximum likelihood setting using EM, can outperform both HMMs and other dynamic Bayesian networks with a similar number of parameters.

artificial intelligence, dependency, machine learning, (16 more...)

arXiv.org Machine Learning

1301.3837

Country:

North America > United States (0.28)
Asia > Japan (0.28)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback