Goto

Collaborating Authors

 Statistical Learning


Regression with Input-dependent Noise: A Gaussian Process Treatment

Neural Information Processing Systems

The prior can be obtained by placing prior distributions on the weights in a neural 494 P. W Goldberg, C. K. L Williams and C. M. Bishop network, although we would argue that it is perhaps more natural to place priors directly overfunctions. One tractable way of doing this is to create a Gaussian process prior. This has the advantage that predictions can be made from the posterior using only matrix multiplication for fixed hyperparameters and a global noise level. In contrast, for neural networks (with fixed hyperparameters and a global noise level) it is necessary to use approximations or Markov chain Monte Carlo (MCMC) methods. Rasmussen(1996) has demonstrated that predictions obtained with Gaussian processes are as good as or better than other state-of-the art predictors. In much of the work on regression problems in the statistical and neural networks literatures, it is assumed that there is a global noise level, independent of the input vector x. The book by Bishop (1995) and the papers by Bishop (1994), MacKay (1995) and Bishop and Qazaz (1997) have examined the case of input-dependent noise for parametric models such as neural networks.



Experiences with Bayesian Learning in a Real World Application

Neural Information Processing Systems

Sleep staging is usually based on rules defined by Rechtschaffen and Kales (see [8]). Rechtschaffen and Kales rules define 4 sleep stages, stage one to four, as well as rapid eye movement (REM) and wakefulness. In [1] J. Bentrup and S. Ray report that every year nearly one million US citizens consulted their physicians concerning their sleep. Since sleep staging is a tedious task (one all night recording on average takes abou t 3 hours to score manually), much effort was spent in designing automatic sleep stagers. Sleep staging is a classification problem which was solved using classical statistical t.echniques or techniques emerged from the field of artificial intelligence (AI) . Among classical techniques especially the k nearest neighbor technique was used. In [1] J. Bentrup and S. Ray report that the classical technique outperformed their AI approaches. Among techniques from the field of AI, researchers used inductive learning to build tree based classifiers (e.g.


Recovering Perspective Pose with a Dual Step EM Algorithm

Neural Information Processing Systems

This paper describes a new approach to extracting 3D perspective structure from 2D point-sets. The novel feature is to unify the tasks of estimating transformation geometry and identifying pointcorrespondence matches.Unification is realised by constructing a mixture model over the bipartite graph representing the correspondence matchand by effecting optimisation using the EM algorithm. According to our EM framework the probabilities of structural correspondence gatecontributions to the expected likelihood function used to estimate maximum likelihood perspective pose parameters. This provides a means of rejecting structural outliers.


A Non-Parametric Multi-Scale Statistical Model for Natural Images

Neural Information Processing Systems

The observed distribution of natural images is far from uniform. On the contrary, real images have complex and important structure thatcan be exploited for image processing, recognition and analysis. There have been many proposed approaches to the principled statisticalmodeling of images, but each has been limited in either the complexity of the models or the complexity of the images. Wepresent a nonparametric multi-scale statistical model for images that can be used for recognition, image de-noising, and in a "generative mode" to synthesize high quality textures.


Modeling Acoustic Correlations by Factor Analysis

Neural Information Processing Systems

Hidden Markov models (HMMs) for automatic speech recognition rely on high dimensional feature vectors to summarize the shorttime propertiesof speech. Correlations between features can arise when the speech signal is non-stationary or corrupted by noise. We investigate how to model these correlations using factor analysis, a statistical method for dimensionality reduction. Factor analysis uses a small number of parameters to model the covariance structure ofhigh dimensional data. These parameters are estimated by an Expectation-Maximization (EM) algorithm that can be embedded inthe training procedures for HMMs.


Mapping a Manifold of Perceptual Observations

Neural Information Processing Systems

Nonlinear dimensionality reduction is formulated here as the problem of trying to find a Euclidean feature-space embedding of a set of observations that preserves as closely as possible their intrinsic metric structure - the distances between points on the observation manifold as measured along geodesic paths. Our isometric feature mapping procedure, or isomap, is able to reliably recover low-dimensional nonlinear structure in realistic perceptual data sets, such as a manifold of face images, where conventional global mapping methods find only local minima. The recovered map provides a canonical set of globally meaningful features, which allows perceptual transformations such as interpolation, extrapolation, and analogy - highly nonlinear transformations in the original observation space - to be computed with simple linear operations in feature space.


Training Methods for Adaptive Boosting of Neural Networks

Neural Information Processing Systems

"Boosting" is a general method for improving the performance of any learning algorithm that consistently generates classifiers which need to perform only slightly better than random guessing. A recently proposed and very promising boosting algorithm is AdaBoost [5]. It has been applied withgreat success to several benchmark machine learning problems using rather simple learning algorithms [4], and decision trees [1, 2, 6]. In this paper we use AdaBoost to improve the performances of neural networks. We compare training methods based on sampling the training set and weighting the cost function. Our system achieves about 1.4% error on a data base of online handwritten digits from more than 200 writers. Adaptive boosting of a multi-layer network achieved 1.5% error on the UCI Letters and 8.1 % error on the UCI satellite data set.


Prior Knowledge in Support Vector Kernels

Neural Information Processing Systems

We explore methods for incorporating prior knowledge about a problem at hand in Support Vector learning machines. We show that both invariances undergroup transfonnations and prior knowledge about locality in images can be incorporated by constructing appropriate kernel functions.


EM Algorithms for PCA and SPCA

Neural Information Processing Systems

I present an expectation-maximization (EM) algorithm for principal component analysis (PCA). The algorithm allows a few eigenvectors and eigenvalues to be extracted from large collections of high dimensional data. It is computationally very efficient in space and time.