AITopics

One way to approximate inference in richly-connected graphical models is to apply the sum-product algorithm (a.k.a. The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probability functions - including the sigmoid function - direct application of the sum-product algorithm is not possible. We introduce "accumulator networks" that have low local complexity (but exponential global complexity) so the sum-product algorithm can be directly applied. In an accumulator network, the probability of a child given its parents is computed by accumulating the inputs from the parents in a Markov chain or more generally a tree. After giving expressions for inference and learning in accumulator networks, we give results on the "bars problem" and on the problem of extracting translated, overlapping faces from an image. 1 Introduction Graphical probability models with hidden variables are capable of representing complex dependencies between variables, filling in missing data and making Bayesoptimal decisions using probabilistic inferences (Hinton and Sejnowski 1986; Pearl 1988; Neal 1992).

accumulator network, inference, probability, (15 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.05)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.91)

Graepel, Thore, Herbrich, Ralf

The Kernel Gibbs Sampler

We present an algorithm that samples the hypothesis space of kernel classifiers. Given a uniform prior over normalised weight vectors and a likelihood based on a model of label noise leads to a piecewise constant posterior that can be sampled by the kernel Gibbs sampler (KGS). The KGS is a Markov Chain Monte Carlo method that chooses a random direction in parameter space and samples from the resulting piecewise constant density along the line chosen. The KGS can be used as an analytical tool for the exploration of Bayesian transduction, Bayes point machines, active learning, and evidence-based model selection on small data sets that are contaminated with label noise. For a simple toy example we demonstrate experimentally how a Bayes point machine based on the KGS outperforms an SVM that is incapable of taking into account label noise. 1 Introduction Two great ideas have dominated recent developments in machine learning: the application of kernel methods and the popularisation of Bayesian inference.

classifier, label noise, posterior distribution, (9 more...)

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Berlin (0.05)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Ghahramani, Zoubin, Beal, Matthew J.

Propagation Algorithms for Variational Bayesian Learning

Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set. 1 Introduction Bayesian approaches to machine learning have several desirable properties.

algorithm, state-space model, variational bayesian, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > California > San Mateo County > San Mateo (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Frey, Brendan J., Patrascu, Relu, Jaakkola, Tommi, Moran, Jodi

Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks

Exact inference in large, richly connected noisy-OR networks is intractable, and most approximate inference algorithms tend to concentrate on a small number of most probable configurations of the hidden variables under the posterior. We presented an "inclusive" variational method for bipartite noisy-OR networks that favors including all probable configurations, at the cost of including some improbable configurations. The method fits a tree to the posterior distribution sequentially, i.e., one observation at a time. Results on an ensemble of QMR-DT type networks show that the method performs better than local probability propagation and a variational upper bound for ranking most probable diseases.

algorithm, inference, symptom, (12 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.09)
(3 more...)

Industry: Health & Medicine > Diagnostic Medicine (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.30)

High-temperature Expansions for Learning Models of Nonnegative Data

Downs, Oliver B.

Recent work has exploited boundedness of data in the unsupervised learning of new types of generative model. For nonnegative data it was recently shown that the maximum-entropy generative model is a Nonnegative Boltzmann Distribution not a Gaussian distribution, when the model is constrained to match the first and second order statistics of the data. Learning for practical sized problems is made difficult by the need to compute expectations under the model distribution. The computational cost of Markov chain Monte Carlo methods and low fidelity of naive mean field techniques has led to increasing interest in advanced mean field theories and variational methods. Here I present a secondorder mean-field approximation for the Nonnegative Boltzmann Machine model, obtained using a "high-temperature" expansion. The theory is tested on learning a bimodal 2-dimensional model, a high-dimensional translationally invariant distribution, and a generative model for handwritten digits.

approximation, expansion, model distribution, (13 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.56)

Chen, Scott Saobing, Gopinath, Ramesh A.

Gaussianization

High dimensional data modeling is difficult mainly because the so-called "curse of dimensionality". We propose a technique called "Gaussianization" for high dimensional density estimation, which alleviates the curse of dimensionality by exploiting the independence structures in the data. Gaussianization is motivated from recent developments in the statistics literature: projection pursuit, independent component analysis and Gaussian mixture models with semi-tied covariances. We propose an iterative Gaussianization procedure which converges weakly: at each iteration, the data is first transformed to the least dependent coordinates and then each coordinate is marginally Gaussianized by univariate techniques. Gaussianization offers density estimation sharper than traditional kernel methods and radial basis function methods.

algorithm, gaussianization, iteration, (13 more...)

Country: North America > United States > New York > Suffolk County > East Setauket (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.69)

Cadez, Igor V., Smyth, Padhraic

Model Complexity, Goodness of Fit and Diminishing Returns

Such learning tasks can typically be characterized by the existence of a model and a loss function. A fitted model of complexity k is a function of the data points D and depends on a specific set of fitted parameters B. The loss function (goodnessof-fit) is a functional of the model and maps each specific model to a scalar used to evaluate the model, e.g., likelihood for density estimation or sum-of-squares for regression. Figure 1 illustrates a typical empirical curve for loss function versus complexity, for mixtures of Markov models fitted to a large data set of 900,000 sequences. The complexity k is the number of Markov models being used in the mixture (see Cadez et al. (2000) for further details on the model and the data set). The empirical curve has a distinctly concave appearance, with large relative gains in fit for low complexity models and much more modest relative gains for high complexity models.

linear regression, loss function, regression, (12 more...)

Country:

North America > United States > California > Orange County > Irvine (0.14)
North America > United States > Washington > King County > Redmond (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

Bhattacharyya, Chiranjib, Keerthi, S. Sathiya

A Variational Mean-Field Theory for Sigmoidal Belief Networks

In this paper we will discuss a variational mean-field theory and its application to BNs, sigmoidal BNs in particular. We present a variational derivation of the mean-field theory, proposed by Plefka[2].

approximation, mean-field theory, plefka, (15 more...)

Country:

Asia > Middle East > Jordan (0.06)
Asia > Singapore (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.43)

Algebraic Information Geometry for Learning Machines with Singularities

Watanabe, Sumio

Algebraic geometry is essential to learning theory. In hierarchical learning machines such as layered neural networks and gaussian mixtures, the asymptotic normality does not hold, since Fisher information matrices are singular. In this paper, the rigorous asymptotic form of the stochastic complexity is clarified based on resolution of singularities and two different problems are studied.

jeffrey, stochastic complexity, true distribution, (11 more...)

Country:

Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.47)

Rasmussen, Carl Edward, Ghahramani, Zoubin

Occam's Razor

The Bayesian paradigm apparently only sometimes gives rise to Occam's Razor; at other times very large models perform well. We give simple examples of both kinds of behaviour. The two views are reconciled when measuring complexity of functions, rather than of the machinery used to implement them. We analyze the complexity of functions for some linear in the parameter models that are equivalent to Gaussian Processes, and always find Occam's Razor at work. 1 Introduction Occam's Razor is a well known principle of "parsimony of explanations" which is influential in scientific thinking in general and in problems of statistical inference in particular. In this paper we review its consequences for Bayesian statistical models, where its behaviour can be easily demonstrated and quantified.

complexity, occam, razor, (15 more...)

Country:

Europe > Denmark > Capital Region > Kongens Lyngby (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England (0.04)

Genre: Overview (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)