Goto

Collaborating Authors

 Statistical Learning


Radial Basis Functions: A Bayesian Treatment

Neural Information Processing Systems

Bayesian methods have been successfully applied to regression and classification problems in multi-layer perceptrons. We present a novel application of Bayesian techniques to Radial Basis Function networks by developing a Gaussian approximation to the posterior distribution which, for fixed basis function widths, is analytic in the parameters. The setting of regularization constants by crossvalidation is wasteful as only a single optimal parameter estimate is retained. We treat this issue by assigning prior distributions to these constants, which are then adapted in light of the data under a simple re-estimation formula. 1 Introduction Radial Basis Function networks are popular regression and classification tools[lO]. For fixed basis function centers, RBFs are linear in their parameters and can therefore be trained with simple one shot linear algebra techniques[lO]. The use of unsupervised techniques to fix the basis function centers is, however, not generally optimal since setting the basis function centers using density estimation on the input data alone takes no account of the target values associated with that data. Ideally, therefore, we should include the target values in the training procedure[7, 3, 9]. Unfortunately, allowing centers to adapt to the training targets leads to the RBF being a nonlinear function of its parameters, and training becomes more problematic. Most methods that perform supervised training of RBF parameters minimize the ยทPresent address: SNN, University of Nijmegen, Geert Grooteplein 21, Nijmegen, The Netherlands.


S-Map: A Network with a Simple Self-Organization Algorithm for Generative Topographic Mappings

Neural Information Processing Systems

The S-Map is a network with a simple learning algorithm that combines the self-organization capability of the Self-Organizing Map (SOM) and the probabilistic interpretability of the Generative Topographic Mapping (GTM). The simulations suggest that the S Map algorithm has a stronger tendency to self-organize from random initial configuration than the GTM. The S-Map algorithm can be further simplified to employ pure Hebbian learning, without changing the qualitative behaviour of the network. 1 Introduction The self-organizing map (SOM; for a review, see [1]) forms a topographic mapping from the data space onto a (usually two-dimensional) output space. The SOM has been succesfully used in a large number of applications [2]; nevertheless, there are some open theoretical questions, as discussed in [1, 3]. Most of these questions arise because of the following two facts: the SOM is not a generative model, i.e. it does not generate a density in the data space, and it does not have a well-defined objective function that the training process would strictly minimize.


On Efficient Heuristic Ranking of Hypotheses

Neural Information Processing Systems

Voice: (818) 306-6144 FAX: (818) 306-6912 Content Areas: Applications (Stochastic Optimization),Model Selection Algorithms Abstract This paper considers the problem of learning the ranking of a set of alternatives based upon incomplete information (e.g., a limited number of observations). We describe two algorithms for hypothesis ranking and their application for probably approximately correct (PAC) and expected loss (EL) learning criteria. Empirical results are provided to demonstrate the effectiveness of these ranking procedures on both synthetic datasets and real-world data from a spacecraft design optimization problem. 1 INTRODUCTION In many learning applications, the cost of information can be quite high, imposing a requirement that the learning algorithms glean as much usable information as possible with a minimum of data. For example: - In speedup learning, the expense of processing each training example can be significant [Tadepalli921. This paper provides a statistical decision-theoretic framework for the ranking of parametric distributions.


Regression with Input-dependent Noise: A Gaussian Process Treatment

Neural Information Processing Systems

Gaussian processes provide natural nonparametric prior distributions over regression functions. In this paper we consider regression problems where there is noise on the output, and the variance of the noise depends on the inputs. If we assume that the noise is a smooth function of the inputs, then it is natural to model the noise variance using a second Gaussian process, in addition to the Gaussian process governing the noise-free output value. We show that prior uncertainty about the parameters controlling both processes can be handled and that the posterior distribution of the noise rate can be sampled from using Markov chain Monte Carlo methods. Our results on a synthetic data set give a posterior noise variance that well-approximates the true variance.


Recovering Perspective Pose with a Dual Step EM Algorithm

Neural Information Processing Systems

This paper describes a new approach to extracting 3D perspective structure from 2D point-sets. The novel feature is to unify the tasks of estimating transformation geometry and identifying pointcorrespondence matches. Unification is realised by constructing a mixture model over the bipartite graph representing the correspondence match and by effecting optimisation using the EM algorithm. According to our EM framework the probabilities of structural correspondence gate contributions to the expected likelihood function used to estimate maximum likelihood perspective pose parameters. This provides a means of rejecting structural outliers.


A Non-Parametric Multi-Scale Statistical Model for Natural Images

Neural Information Processing Systems

The observed distribution of natural images is far from uniform. On the contrary, real images have complex and important structure that can be exploited for image processing, recognition and analysis. There have been many proposed approaches to the principled statistical modeling of images, but each has been limited in either the complexity of the models or the complexity of the images. We present a nonparametric multi-scale statistical model for images that can be used for recognition, image de-noising, and in a "generative mode" to synthesize high quality textures.


Hybrid NN/HMM-Based Speech Recognition with a Discriminant Neural Feature Extraction

Neural Information Processing Systems

In this paper, we present a novel hybrid architecture for continuous speech recognition systems. It consists of a continuous HMM system extended by an arbitrary neural network that is used as a preprocessor that takes several frames of the feature vector as input to produce more discriminative feature vectors with respect to the underlying HMM system. This hybrid system is an extension of a state-of-the-art continuous HMM system, and in fact, it is the first hybrid system that really is capable of outperforming these standard systems with respect to the recognition accuracy. Experimental results show an relative error reduction of about 10% that we achieved on a remarkably good recognition system based on continuous HMMs for the Resource Management 1 OOO-word continuous speech recognition task.


Modeling Acoustic Correlations by Factor Analysis

Neural Information Processing Systems

Hidden Markov models (HMMs) for automatic speech recognition rely on high dimensional feature vectors to summarize the shorttime properties of speech. Correlations between features can arise when the speech signal is non-stationary or corrupted by noise. We investigate how to model these correlations using factor analysis, a statistical method for dimensionality reduction. Factor analysis uses a small number of parameters to model the covariance structure of high dimensional data. These parameters are estimated by an Expectation-Maximization (EM) algorithm that can be embedded in the training procedures for HMMs.


Mapping a Manifold of Perceptual Observations

Neural Information Processing Systems

Nonlinear dimensionality reduction is formulated here as the problem of trying to find a Euclidean feature-space embedding of a set of observations that preserves as closely as possible their intrinsic metric structure - the distances between points on the observation manifold as measured along geodesic paths. Our isometric feature mapping procedure, or isomap, is able to reliably recover low-dimensional nonlinear structure in realistic perceptual data sets, such as a manifold of face images, where conventional global mapping methods find only local minima. The recovered map provides a canonical set of globally meaningful features, which allows perceptual transformations such as interpolation, extrapolation, and analogy - highly nonlinear transformations in the original observation space - to be computed with simple linear operations in feature space.


Training Methods for Adaptive Boosting of Neural Networks

Neural Information Processing Systems

"Boosting" is a general method for improving the performance of any learning algorithm that consistently generates classifiers which need to perform only slightly better than random guessing. A recently proposed and very promising boosting algorithm is AdaBoost [5]. It has been applied with great success to several benchmark machine learning problems using rather simple learning algorithms [4], and decision trees [1, 2, 6]. In this paper we use AdaBoost to improve the performances of neural networks. We compare training methods based on sampling the training set and weighting the cost function. Our system achieves about 1.4% error on a data base of online handwritten digits from more than 200 writers. Adaptive boosting of a multi-layer network achieved 1.5% error on the UCI Letters and 8.1 % error on the UCI satellite data set.