AITopics

We present a simple sparse greedy technique to approximate the maximum a posteriori estimate of Gaussian Processes with much improved scaling behaviour in the sample size m.

approximation, log posterior, matrix, (12 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > Australia > Australian Capital Territory > Canberra (0.05)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Graepel, Thore, Herbrich, Ralf

The Kernel Gibbs Sampler

We present an algorithm that samples the hypothesis space of kernel classifiers.Given a uniform prior over normalised weight vectors and a likelihood based on a model of label noise leads to a piecewise constantposterior that can be sampled by the kernel Gibbs sampler (KGS). The KGS is a Markov Chain Monte Carlo method that chooses a random direction in parameter space and samples from the resulting piecewise constant density along the line chosen. The KGS can be used as an analytical tool for the exploration of Bayesian transduction, Bayes point machines, active learning, and evidence-based model selection on small data sets that are contaminated withlabel noise. For a simple toy example we demonstrate experimentally how a Bayes point machine based on the KGS outperforms anSVM that is incapable of taking into account label noise. 1 Introduction Two great ideas have dominated recent developments in machine learning: the application ofkernel methods and the popularisation of Bayesian inference. Focusing on the task of classification, various connections between the two areas exist: kernels havelong been a part of Bayesian inference in the disguise of covariance nmctions thatcharacterise priors over functions [9].

classifier, label noise, posterior distribution, (10 more...)

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Berlin (0.05)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Csató, Lehel, Opper, Manfred

Sparse Representation for Gaussian Process Models

We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm togetherwith a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. Experimental resultson toy examples and large real-world datasets indicate the efficiency of the approach.

approximation, gaussian process, vector, (15 more...)

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Modeling & Simulation (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Herbrich, Ralf, Graepel, Thore

A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work

We present a bound on the generalisation error of linear classifiers in terms of a refined margin quantity on the training set. The result is obtained in a PAC-Bayesian framework and is based on geometrical arguments in the space of linear classifiers. The new bound constitutes an exponential improvement of the so far tightest margin bound by Shawe-Taylor et al. [8] and scales logarithmically in the inverse margin. Even in the case of less training examples than input dimensions sufficiently large margins lead to nontrivial bound values and - for maximum margins - to a vanishing complexity term.Furthermore, the classical margin is too coarse a measure for the essential quantity that controls the generalisation error: the volume ratio between the whole hypothesis space and the subset of consistent hypotheses. The practical relevance of the result lies in the fact that the well-known support vector machine is optimal w.r.t. the new bound only if the feature vectors are all of the same length.

classifier, generalisation error, linear classifier, (14 more...)

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)

Hochreiter, Sepp, Mozer, Michael C.

Beyond Maximum Likelihood and Density Estimation: A Sample-Based Criterion for Unsupervised Learning of Complex Models

Two well known classes of unsupervised procedures that can be cast in this manner are generative and recoding models. In a generative unsupervised framework, the environment generates training exampleswhich we will refer to as observations-by sampling from one distribution; the other distribution is embodied in the model. Examples of generative frameworks are mixtures of Gaussians (MoG) [2], factor analysis [4], and Boltzmann machines [8]. In the recoding unsupervised framework, the model transforms points from an obser- vation space to an output space, and the output distribution is compared either to a reference distribution or to a distribution derived from the output distribution.

artificial intelligence, machine learning, particle, (17 more...)

Country:

Europe (0.70)
North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.41)

Frey, Brendan J., Kannan, Anitha

Accumulator Networks: Suitors of Local Probability Propagation

The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probabilityfunctions - including the sigmoid function - direct application of the sum-product algorithm is not possible. We introduce "accumulator networks" that have low local complexity (but exponential global complexity) so the sum-product algorithm can be directly applied. In an accumulator network, the probability of a child given its parents is computed by accumulating the inputs from the parents in a Markov chain or more generally a tree. After giving expressions for inference and learning in accumulator networks, wegive results on the "bars problem" and on the problem of extracting translated, overlapping faces from an image. 1 Introduction Graphical probability models with hidden variables are capable of representing complex dependenciesbetween variables, filling in missing data and making Bayesoptimal decisionsusing probabilistic inferences (Hinton and Sejnowski 1986; Pearl 1988; Neal 1992). Large, richly-connected networks with many cycles can potentially beused to model complex sources of data, such as audio signals, images and video. However, when the number of cycles in the network is large (more precisely, when the cut set size is exponential), exact inference becomes intractable. Also, to learn a probability model with hidden variables, we need to fill in the missing data using probabilistic inference, so learning also becomes intractable. To cope with the intractability of exact inference, a variety of approximate inference methods have been invented, including Monte Carlo (Hinton and Sejnowski 1986; Neal 1992), Helmholz machines (Dayan et al. 1995; Hinton et al. 1995), and variational techniques (Jordan et al. 1998).

artificial intelligence, bayesian inference, machine learning, (18 more...)

Country:

Asia > Middle East > Jordan (0.25)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.71)

Ghahramani, Zoubin, Beal, Matthew J.

Propagation Algorithms for Variational Bayesian Learning

Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical resultsfor the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results tothe Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation,while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality ofthe state-space model in a variety of synthetic problems and one real high-dimensional data set. 1 Introduction Bayesian approaches to machine learning have several desirable properties. Bayesian integration does not suffer overfitting (since nothing is fit to the data). Prior knowledge canbe incorporated naturally and all uncertainty is manipulated in a consistent manner. Moreover it is possible to learn model structures and readily compare between model classes. Unfortunately, for most models of interest a full Bayesian analysis is computationally intractable.

artificial intelligence, bayesian inference, machine learning, (17 more...)

Country: Europe > United Kingdom (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Automatic Choice of Dimensionality for PCA

Minka, Thomas P.

A central issue in principal component analysis (PCA) is choosing the number of principal components to be retained. By interpreting PCA as density estimation, we show how to use Bayesian model selection to estimate thetrue dimensionality of the data. The resulting estimate is simple to compute yet guaranteed to pick the correct dimensionality, given enough data. The estimate involves an integral over the Steifel manifold of k-frames, which is difficult to compute exactly. But after choosing an appropriate parameterization and applying Laplace's method, an accurate andpractical estimator is obtained. In simulations, it is convincingly better than cross-validation and other proposed algorithms, plus it runs much faster.

artificial intelligence, bayesian inference, machine learning, (17 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)

Vasconcelos, Nuno, Lippman, Andrew

Bayesian Video Shot Segmentation

Prior knowledge about video structure can be used both as a means to improve the peiformance of content analysis and to extract features that allow semantic classification. We introduce statistical models for two important components of this structure, shot duration and activity, and demonstrate the usefulness of these models by introducing a Bayesian formulation for the shot segmentation problem. The new formulations is shown to extend standard thresholding methods in an adaptive and intuitive way, leading to improved segmentation accuracy.

artificial intelligence, machine learning, threshold, (17 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California (0.14)

Industry:

Leisure & Entertainment (0.94)
Media > Film (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Pedersen, Liam, Apostolopoulos, Dimitrios, Whittaker, William

Bayes Networks on Ice: Robotic Search for Antarctic Meteorites

A Bayes network based classifier for distinguishing terrestrial rocks from meteorites is implemented onboard the Nomad robot. Equipped with a camera, spectrometer and eddy current sensor, this robot searched the ice sheets of Antarctica and autonomously made the first robotic identification of a meteorite, in January 2000 at the Elephant Moraine. This paper discusses rock classification from a robotic platform, and describes the system onboard Nomad. 1 Introduction Figure 1: Human meteorite search with snowmobiles on the Antarctic ice sheets, and on foot in the moraines. Antarctica contains the most fertile meteorite hunting grounds on Earth. The pristine, dry and cold environment ensures that meteorites deposited there are preserved for long periods.

artificial intelligence, machine learning, meteorite, (18 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.15)

Industry: Government > Regional Government > North America Government > United States Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)