AITopics

Exact inference in large, richly connected noisy-OR networks is intractable, and most approximate inference algorithms tend to concentrate on a small number of most probable configurations of the hidden variables under the posterior. We presented an "inclusive" variational method for bipartite noisy-OR networks that favors including all probable configurations, at the cost of including some improbable configurations. The method fits a tree to the posterior distribution sequentially, i.e., one observation at a time. Results on an ensemble of QMR-DT type networks show that the method performs better than local probability propagation and a variational upper bound for ranking most probable diseases.

algorithm, inference, symptom, (12 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.09)
(3 more...)

Industry: Health & Medicine > Diagnostic Medicine (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.30)

Frey, Brendan J., Kannan, Anitha

Accumulator Networks: Suitors of Local Probability Propagation

One way to approximate inference in richly-connected graphical models is to apply the sum-product algorithm (a.k.a. The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probability functions - including the sigmoid function - direct application of the sum-product algorithm is not possible. We introduce "accumulator networks" that have low local complexity (but exponential global complexity) so the sum-product algorithm can be directly applied. In an accumulator network, the probability of a child given its parents is computed by accumulating the inputs from the parents in a Markov chain or more generally a tree. After giving expressions for inference and learning in accumulator networks, we give results on the "bars problem" and on the problem of extracting translated, overlapping faces from an image. 1 Introduction Graphical probability models with hidden variables are capable of representing complex dependencies between variables, filling in missing data and making Bayesoptimal decisions using probabilistic inferences (Hinton and Sejnowski 1986; Pearl 1988; Neal 1992).

accumulator network, inference, probability, (15 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.05)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.91)

Discovering Hidden Variables: A Structure-Based Approach

Elidan, Gal, Lotner, Noam, Friedman, Nir, Koller, Daphne

A serious problem in learning probabilistic models is the presence of hidden variables. These variables are not observed, yet interact with several of the observed variables. As such, they induce seemingly complex dependencies among the latter. In recent years, much attention has been devoted to the development of algorithms for learning parameters, and in some cases structure, in the presence of hidden variables. In this paper, we address the related problem of detecting hidden variables that interact with the observed variables.

algorithm, bayesian network, procedure, (14 more...)

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Banking & Finance > Trading (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.71)

Csató, Lehel, Opper, Manfred

Sparse Representation for Gaussian Process Models

We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. Experimental results on toy examples and large real-world data sets indicate the efficiency of the approach.

approximation, gaussian process, vector, (15 more...)

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Modeling & Simulation (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Bhattacharyya, Chiranjib, Keerthi, S. Sathiya

A Variational Mean-Field Theory for Sigmoidal Belief Networks

In this paper we will discuss a variational mean-field theory and its application to BNs, sigmoidal BNs in particular. We present a variational derivation of the mean-field theory, proposed by Plefka[2].

approximation, mean-field theory, plefka, (15 more...)

Country:

Asia > Middle East > Jordan (0.06)
Asia > Singapore (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.43)

Rasmussen, Carl Edward, Ghahramani, Zoubin

Occam's Razor

The Bayesian paradigm apparently only sometimes gives rise to Occam's Razor; at other times very large models perform well. We give simple examples of both kinds of behaviour. The two views are reconciled when measuring complexity of functions, rather than of the machinery used to implement them. We analyze the complexity of functions for some linear in the parameter models that are equivalent to Gaussian Processes, and always find Occam's Razor at work. 1 Introduction Occam's Razor is a well known principle of "parsimony of explanations" which is influential in scientific thinking in general and in problems of statistical inference in particular. In this paper we review its consequences for Bayesian statistical models, where its behaviour can be easily demonstrated and quantified.

complexity, occam, razor, (15 more...)

Country:

Europe > Denmark > Capital Region > Kongens Lyngby (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England (0.04)

Genre: Overview (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Nemenman, Ilya, Bialek, William

Learning Continuous Distributions: Simulations With Field Theoretic Priors

Learning of a smooth but nonparametric probability density can be regularized using methods of Quantum Field Theory. We implement a field theoretic prior numerically, test its efficacy, and show that the free parameter of the theory (,smoothness scale') can be determined self consistently by the data; this forms an infinite dimensional generalization of the MDL principle. Finally, we study the implications of one's choice of the prior and the parameterization and conclude that the smoothness scale determination makes density estimation very weakly sensitive to the choice of the prior, and that even wrong choices can be advantageous for small data sets. One of the central problems in learning is to balance'goodness of fit' criteria against the complexity of models. An important development in the Bayesian approach was thus the realization that there does not need to be any extra penalty for model complexity: if we compute the total probability that data are generated by a model, there is a factor from the volume in parameter space-the'Occam factor' -that discriminates against models with more parameters [1, 2].

best fit, occam factor, smoothness scale, (12 more...)

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.05)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Singapore (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Kappen, Hilbert J., Wiegerinck, Wim

Second Order Approximations for Probability Models

In this paper, we derive a second order mean field theory for directed graphical probability models. By using an information theoretic argument it is shown how this can be done in the absense of a partition function. This method is a direct generalisation of the well-known TAP approximation for Boltzmann Machines. In a numerical example, it is shown that the method greatly improves the first order mean field approximation. For a restricted class of graphical models, so-called single overlap graphs, the second order method has comparable complexity to the first order method. For sigmoid belief networks, the method is shown to be particularly fast and effective.

approximation, graphical model, sigmoid belief network, (14 more...)

Country:

Europe > Netherlands > Gelderland > Nijmegen (0.05)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.70)

Jebara, Tony, Pentland, Alex

On Reversing Jensen's Inequality

Jensen's inequality is a powerful mathematical tool and one of the workhorses in statistical learning. Its applications therein include the EM algorithm, Bayesian estimation and Bayesian inference. Jensen computes simple lower bounds on otherwise intractable quantities such as products of sums and latent log-likelihoods. This simplification then permits operations like integration and maximization. Quite often (i.e. in discriminative learning) upper bounds are needed as well. We derive and prove an efficient analytic inequality that provides such variational upper bounds. This inequality holds for latent variable mixtures of exponential family distributions and thus spans a wide range of contemporary statistical models. We also discuss applications of the upper bounds including maximum conditional likelihood, large margin discriminative models and conditional Bayesian inference. Convergence, efficiency and prediction results are shown.

inequality, jensen, likelihood, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)

Herbrich, Ralf, Graepel, Thore

A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work

We present a bound on the generalisation error of linear classifiers in terms of a refined margin quantity on the training set. The result is obtained in a PAC-Bayesian framework and is based on geometrical arguments in the space of linear classifiers. The new bound constitutes an exponential improvement of the so far tightest margin bound by Shawe-Taylor et al. [8] and scales logarithmically in the inverse margin. Even in the case of less training examples than input dimensions sufficiently large margins lead to nontrivial bound values and - for maximum margins - to a vanishing complexity term. Furthermore, the classical margin is too coarse a measure for the essential quantity that controls the generalisation error: the volume ratio between the whole hypothesis space and the subset of consistent hypotheses.

classifier, generalisation error, linear classifier, (13 more...)

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)