AITopics

Many real world patterns have a hierarchical underlying structure in which simple features have a higher order structure among themselves. Because these relationships are often statistical in nature, it is natural to view the process of discovering such structures as a statistical inference problem in which a hierarchical model is fit to data. Hierarchical statistical structure can be conveniently represented with Bayesian belief networks (Pearl, 1988; Lauritzen and Spiegelhalter, 1988; Neal, 1992). These 530 M.S. Lewicki and T. 1. Sejnowski models are powerful, because they can capture complex statistical relationships among the data variables, and also mathematically convenient, because they allow efficient computation of the joint probability for any given set of model parameters. The joint probability density of a network of binary states is given by a product of conditional probabilities (1) where W is the weight matrix that parameterizes the model. Note that the probability ofan individual state Si depends only on its parents.

artificial intelligence, machine learning, representation, (15 more...)

Country: North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Ordered Classes and Incomplete Examples in Classification

Mathieson, Mark

The classes in classification tasks often have a natural ordering, and the training and testing examples are often incomplete. We propose a nonlinear ordinalmodel for classification into ordered classes. Predictive, simulation-based approaches are used to learn from past and classify future incompleteexamples. These techniques are illustrated by making prognoses for patients who have suffered severe head injuries.

artificial intelligence, imputation, machine learning, (16 more...)

Country: Europe > United Kingdom > England (0.14)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.70)

Dunmur, A. P., Titterington, D. M.

On a Modification to the Mean Field EM Algorithm in Factorial Learning

A modification is described to the use of mean field approximations inthe E step of EM algorithms for analysing data from latent structure models, as described by Ghahramani (1995), among others. Themodification involves second-order Taylor approximations to expectations computed in the E step. The potential benefits of the method are illustrated using very simple latent profile models. 1 Introduction Ghahramani (1995) advocated the use of mean field methods as a means to avoid the heavy computation involved in the E step of the EM algorithm used for estimating parameters within a certain latent structure model, and Ghahramani & Jordan (1995) used the same ideas in a more complex situation. Dunmur & Titterington (1996a) identified Ghahramani's model as a so-called latent profile model, they observed that Zhang (1992,1993) had used mean field methods for a similar purpose, and they showed, in a simulation study based on very simple examples, that the mean field version of the EM algorithm often performed very respectably. By this it is meant that, when data were generated from the model under analysis, the estimators of the underlying parameters were efficient, judging by empirical results, especially in comparison with estimators obtained by employing the'correct' EM algorithm: the examples therefore had to be simple enough that the correct EM algorithm is numerically feasible, although any success reported for the mean field 432 A. P. Dunmur and D. M. Titterington version is, one hopes, an indication that the method will also be adequate in more complex situations in which the correct EM algorithm is not implementable because of computational complexity. In spite of the above positive remarks, there were circumstances in which there was a perceptible, if not dramatic, lack of efficiency in the simple (naive) mean field estimators, and the objective of this contribution is to propose and investigate ways of refining the method so as to improve performance without detracting from the appealing, and frequently essential, simplicity of the approach. The procedure used here is based on a second order correction to the naive mean field well known in statistical physics and sometimes called the cavity or TAP method (Mezard, Parisi & Virasoro, 1987). It has been applied recently in cluster analysis (Hofmann & Buhmann, 1996). In Section 2 we introduce the structure of our model, Section 3 explains the refined mean field approach, Section 4 provides numerical results, and Section 5 contains a statement of our conclusions.

artificial intelligence, latent variable, machine learning, (16 more...)

Country:

North America > United States (0.28)
Asia > Middle East > Jordan (0.25)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Littlestone, Nick, Mesterharm, Chris

An Apobayesian Relative of Winnow

We study a mistake-driven variant of an online Bayesian learning algorithm(similar to one studied by Cesa-Bianchi, Helmbold, and Panizza [CHP96]). This variant only updates its state (learns) on trials in which it makes a mistake. The algorithm makes binary classifications using a linear-threshold classifier and runs in time linear inthe number of attributes seen by the learner. We have been able to show, theoretically and in simulations, that this algorithm performs well under assumptions quite different from those embodied inthe prior of the original Bayesian algorithm. It can handle situations that we do not know how to handle in linear time with Bayesian algorithms. We expect our techniques to be useful in deriving and analyzing other apobayesian algorithms. 1 Introduction We consider two styles of online learning.

artificial intelligence, bayesian inference, machine learning, (17 more...)

Country: North America > United States (0.15)

Industry: Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.50)

Interpreting Images by Propagating Bayesian Beliefs

Weiss, Yair

A central theme of computational vision research has been the realization thatreliable estimation of local scene properties requires propagating measurements across the image. Many authors have therefore suggested solving vision problems using architectures of locally connected units updating their activity in parallel. Unfortunately, theconvergence of traditional relaxation methods on such architectures has proven to be excruciatingly slow and in general they do not guarantee that the stable point will be a global minimum. In this paper we show that an architecture in which Bayesian Beliefs aboutimage properties are propagated between neighboring units yields convergence times which are several orders of magnitude fasterthan traditional methods and avoids local minima. In particular our architecture is non-iterative in the sense of Marr [5]: at every time step, the local estimates at a given location are optimal giventhe information which has already been propagated to that location. We illustrate the algorithm's performance on real images and compare it to several existing methods. 1 Theory The essence of our approach is shown in figure 1. Figure 1a shows the prototypical ill-posed problem: interpolation of a function from sparse data.

artificial intelligence, iteration, machine learning, (19 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Industry: Health & Medicine > Therapeutic Area (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Bienenstock, Elie, Geman, Stuart, Potter, Daniel

Compositionality, MDL Priors, and Object Recognition

Images are ambiguous at each of many levels of a contextual hierarchy. Nevertheless,the high-level interpretation of most scenes is unambiguous, as evidenced by the superior performance of humans. Thisobservation argues for global vision models, such as deformable templates.Unfortunately, such models are computationally intractable for unconstrained problems. We propose a compositional modelin which primitives are recursively composed, subject to syntactic restrictions, to form tree-structured objects and object groupings. Ambiguity is propagated up the hierarchy in the form of multiple interpretations, which are later resolved by a Bayesian, equivalently minimum-description-Iength, cost functional.

artificial intelligence, bayesian inference, machine learning, (19 more...)

Country:

North America > United States (0.47)
Europe (0.28)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Pearlmutter, Barak A., Parra, Lucas C.

Maximum Likelihood Blind Source Separation: A Context-Sensitive Generalization of ICA

artificial intelligence, machine learning, maximum likelihood blind source separation, (12 more...)

We cast the problem as one of maximum likelihood density estimation, andin that framework introduce an algorithm that searches for independent components using both temporal and spatial cues. We call the resulting algorithm "Contextual ICA," after the (Bell and Sejnowski 1995) Infomax algorithm, which we show to be a special case of cICA. Because cICA can make use of the temporal structure of its input, it is able separate in a number of situations where standard methods cannot, including sources with low kurtosis, coloredGaussian sources, and sources which have Gaussian histograms. 1 The Blind Source Separation Problem Consider a set of n indepent sources

Country:

North America > United States > New Mexico (0.14)
Asia > Middle East > Israel (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.63)

Monti, Stefano, Cooper, Gregory F.

Learning Bayesian Belief Networks with Neural Network Estimators

The essential property of BBNs is summarized by the Markov condition, which asserts that each variable is independent of its non-descendants given its parents.

artificial intelligence, machine learning, network structure, (18 more...)

Country:

North America > United States > California (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report > Experimental Study (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Continuous Sigmoidal Belief Networks Trained using Slice Sampling

Frey, Brendan J.

These include Boltzmann machines (Hinton and Sejnowski 1986),binary sigmoidal belief networks (Neal 1992) and Helmholtz machines (Hinton et al. 1995; Dayan et al. 1995). However, some hidden variables, such as translation or scaling in images of shapes, are best represented using continuous values.Continuous-valued Boltzmann machines have been developed (Movellan and McClelland 1993), but these suffer from long simulation settling times and the requirement of a "negative phase" during learning. Tibshirani (1992) and Bishop et al. (1996) consider learning mappings from a continuous latent variable space to a higher-dimensional input space. MacKay (1995) has developed "density networks" that can model both continuous and categorical latent spaces using stochasticity at the topmost network layer. In this paper I consider a new hierarchical top-down connectionist model that has stochastic hidden variables at all layers; moreover, these variables can adapt to be continuous or categorical. The proposed top-down model can be viewed as a continuous-valued belief network, whichcan be simulated by performing a quick top-down pass (Pearl 1988).

artificial intelligence, belief network, machine learning, (16 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Bishop, Christopher M., Quazaz, Cazhaow S.

Regression with Input-Dependent Noise: A Bayesian Treatment

In most treatments of the regression problem it is assumed that the distribution of target data can be described by a deterministic function of the inputs, together with additive Gaussian noise having constantvariance. The use of maximum likelihood to train such models then corresponds to the minimization of a sum-of-squares error function. In many applications a more realistic model would allow the noise variance itself to depend on the input variables. However, the use of maximum likelihood to train such models would give highly biased results. In this paper we show how a Bayesian treatment can allow for an input-dependent variance while overcoming thebias of maximum likelihood. 1 Introduction In regression problems it is important not only to predict the output variables but also to have some estimate of the error bars associated with those predictions.

artificial intelligence, machine learning, variance, (17 more...)

Country: Europe > United Kingdom > England (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)