Bayesian Inference
Mixture Density Estimation
Li, Jonathan Q., Barron, Andrew R.
Gaussian mixtures (or so-called radial basis function networks) for density estimation provide a natural counterpart to sigmoidal neural networks for function fitting and approximation. In both cases, it is possible to give simple expressions for the iterative improvement of performance as components of the network are introduced one at a time. In particular, for mixture density estimation we show that a k-component mixture estimated by maximum likelihood (or by an iterative likelihood improvement that we introduce) achieves log-likelihood within order 1/k of the log-likelihood achievable by any convex combination. Consequences for approximation and estimation using Kullback-Leibler risk are also given. A Minimum Description Length principle selects the optimal number of components k that minimizes the risk bound. 1 Introduction In density estimation, Gaussian mixtures provide flexible-basis representations for densities that can be used to model heterogeneous data in high dimensions. Consider a parametric family G { pe(x), x E X C Rd': fJ E The main theme of the paper is to give approximation and estimation bounds of arbitrary densities by finite mixture densities.
Bayesian Averaging is Well-Temperated
Often a learning problem has natural quantitative measure of generalization. If a loss function is defined the natural measure is the generalization error, i.e., the expected loss on a random sample independent of the training set. Generalizability is a key topic of learning theory and much progress has been reported. Analytic results for a broad class of machines can be found in the litterature [8, 12, 9, 10] describing the asymptotic generalization ability of supervised algorithms that are continuously parameterized. Asymptotic bounds on generalization for general machines have been advocated by Vapnik [11]. Generalization results valid for finite training sets can only be obtained for specific learning machines, see e.g.
Efficient Approaches to Gaussian Process Classification
Csatรณ, Lehel, Fokouรฉ, Ernest, Opper, Manfred, Schottky, Bernhard, Winther, Ole
The first two methods are related to mean field ideas known in Statistical Physics. The third approach is based on Bayesian online approach which was motivated by recent results in the Statistical Mechanics of Neural Networks. We present simulation results showing: 1. that the mean field Bayesian evidence may be used for hyperparameter tuning and 2. that the online approach may achieve a low training error fast. 1 Introduction Gaussian processes provide promising nonparametric Bayesian approaches to regression and classification [2, 1].
A Variational Baysian Framework for Graphical Models
This paper presents a novel practical framework for Bayesian model averaging and model selection in probabilistic graphical models. Our approach approximates full posterior distributions over model parameters and structures, as well as latent variables, in an analytical manner. These posteriors fall out of a free-form optimization procedure, which naturally incorporates conjugate priors. Unlike in large sample approximations, the posteriors are generally non Gaussian and no Hessian needs to be computed. Predictive quantities are obtained analytically. The resulting algorithm generalizes the standard Expectation Maximization algorithm, and its convergence is guaranteed. We demonstrate that this approach can be applied to a large class of models in several domains, including mixture models and source separation. 1 Introduction
Rules and Similarity in Concept Learning
This paper argues that two apparently distinct modes of generalizing concepts - abstracting rules and computing similarity to exemplars - should both be seen as special cases of a more general Bayesian learning framework. Bayes explains the specific workings of these two modes - which rules are abstracted, how similarity is measured - as well as why generalization should appear rule-or similarity-based in different situations. This analysis also suggests why the rules/similarity distinction, even if not computationally fundamental, may still be useful at the algorithmic level as part of a principled approximation to fully Bayesian learning.
Bayesian Map Learning in Dynamic Environments
We consider the problem of learning a grid-based map using a robot with noisy sensors and actuators. We compare two approaches: online EM, where the map is treated as a fixed parameter, and Bayesian inference, where the map is a (matrix-valued) random variable. We show that even on a very simple example, online EM can get stuck in local minima, which causes the robot to get "lost" and the resulting map to be useless. By contrast, the Bayesian approach, by maintaining multiple hypotheses, is much more robust. We then introduce a method for approximating the Bayesian solution, called Rao-Blackwellised particle filtering. We show that this approximation, when coupled with an active learning strategy, is fast but accurate.
Learning from User Feedback in Image Retrieval Systems
Vasconcelos, Nuno, Lippman, Andrew
We formulate the problem of retrieving images from visual databases as a problem of Bayesian inference. This leads to natural and effective solutions for two of the most challenging issues in the design of a retrieval system: providing support for region-based queries without requiring prior image segmentation, and accounting for user-feedback during a retrieval session. We present a new learning algorithm that relies on belief propagation to account for both positive and negative examples of the user's interests.
Generalized Model Selection for Unsupervised Learning in High Dimensions
Vaithyanathan, Shivakumar, Dom, Byron
We describe a Bayesian approach to model selection in unsupervised learning that determines both the feature set and the number of clusters. We then evaluate this scheme (based on marginal likelihood) and one based on cross-validated likelihood. For the Bayesian scheme we derive a closed-form solution of the marginal likelihood by assuming appropriate forms of the likelihood function and prior. Extensive experiments compare these approaches and all results are verified by comparison against ground truth. In these experiments the Bayesian scheme using our objective function gave better results than cross-validation. 1 Introduction Recent efforts define the model selection problem as one of estimating the number of clusters[ 10, 17].
Learning the Similarity of Documents: An Information-Geometric Approach to Document Retrieval and Categorization
The project pursued in this paper is to develop from first information-geometric principles a general method for learning the similarity between text documents. Each individual document is modeled as a memoryless information source. Based on a latent class decomposition of the term-document matrix, a lowdimensional (curved) multinomial subfamily is learned. From this model a canonical similarity function - known as the Fisher kernel - is derived. Our approach can be applied for unsupervised and supervised learning problems alike.
Bayesian Reconstruction of 3D Human Motion from Single-Camera Video
Howe, Nicholas R., Leventon, Michael E., Freeman, William T.
The three-dimensional motion of humans is underdetermined when the observation is limited to a single camera, due to the inherent 3D ambiguity of 2D video. We present a system that reconstructs the 3D motion of human subjects from single-camera video, relying on prior knowledge about human motion, learned from training data, to resolve those ambiguities. After initialization in 2D, the tracking and 3D reconstruction is automatic; we show results for several video sequences. The results show the power of treating 3D body tracking as an inference problem.