Goto

Collaborating Authors

 Statistical Learning


Some Solutions to the Missing Feature Problem in Vision

Neural Information Processing Systems

In visual processing the ability to deal with missing and noisy information is crucial. Occlusions and unreliable feature detectors often lead to situations where little or no direct information about features is available. However the available information is usually sufficient to highly constrain the outputs. We discuss Bayesian techniques for extracting class probabilities given partial data. The optimal solution involves integrating over the missing dimensions weighted by the local probability densities. We show how to obtain closed-form approximations to the Bayesian solution using Gaussian basis function networks. The framework extends naturally to the case of noisy features.


Extended Regularization Methods for Nonconvergent Model Selection

Neural Information Processing Systems

Many techniques for model selection in the field of neural networks correspond to well established statistical methods. The method of'stopped training', on the other hand, in which an oversized network is trained until the error on a further validation set of examples deteriorates, then training is stopped, is a true innovation, since model selection doesn't require convergence of the training process. In this paper we show that this performance can be significantly enhanced by extending the'non convergent model selection method' of stopped training to include dynamic topology modifications (dynamic weight pruning) and modified complexity penalty term methods in which the weighting of the penalty term is adjusted during the training process. 1 INTRODUCTION One of the central topics in the field of neural networks is that of model selection. Both the theoretical and practical side of this have been intensively investigated and a vast array of methods have been suggested to perform this task. A widely used class of techniques starts by choosing an'oversized' network architecture then either removing redundant elements based on some measure of saliency (pruning), adding a further term to the cost function penalizing complexity (penalty terms), and finally, observing the error on a further validation set of examples, then stopping training as soon as this performance begins to deteriorate (stopped training).


Assessing and Improving Neural Network Predictions by the Bootstrap Algorithm

Neural Information Processing Systems

The bootstrap method offers an computation intensive alternative to estimate the predictive distribution for a neural network even if the analytic derivation is intractable. The available asymptotic results show that it is valid for a large number of linear, nonlinear and even nonparametric regression problems. It has the potential to model the distribution of estimators to a higher precision than the usual normal asymptotics. It even may be valid if the normal asymptotics fail. However, the theoretical properties of bootstrap procedures for neural networks - especially nonlinear models - have to be investigated more comprehensively.


A Boundary Hunting Radial Basis Function Classifier which Allocates Centers Constructively

Neural Information Processing Systems

A new boundary hunting radial basis function (BH-RBF) classifier which allocates RBF centers constructively near class boundaries is described. This classifier creates complex decision boundaries only in regions where confusions occur and corresponding RBF outputs are similar. A predicted square error measure is used to determine how many centers to add and to determine when to stop adding centers. Two experiments are presented which demonstrate the advantages of the BH RBF classifier. One uses artificial data with two classes and two input features where each class contains four clusters but only one cluster is near a decision region boundary.


Metamorphosis Networks: An Alternative to Constructive Models

Neural Information Processing Systems

Given a set oft raining examples, determining the appropriate number of free parameters is a challenging problem. Constructive learning algorithms attempt to solve this problem automatically by adding hidden units, and therefore free parameters, during learning. We explore an alternative class of algorithms-called metamorphosis algorithms-in which the number of units is fixed, but the number of free parameters gradually increases during learning. The architecture we investigate is composed of RBF units on a lattice, which imposes flexible constraints on the parameters of the network. Virtues of this approach include variable subset selection, robust parameter selection, multiresolution processing, and interpolation of sparse training data.


Efficient Pattern Recognition Using a New Transformation Distance

Neural Information Processing Systems

Memory-based classification algorithms such as radial basis functions or K-nearest neighbors typically rely on simple distances (Euclidean, dot product...), which are not particularly meaningful on pattern vectors. More complex, better suited distance measures are often expensive and rather ad-hoc (elastic matching, deformable templates). We propose a new distance measure which (a) can be made locally invariant to any set of transformations of the input and (b) can be computed efficiently. We tested the method on large handwritten character databases provided by the Post Office and the NIST. Using invariances with respect to translation, rotation, scaling, shearing and line thickness, the method consistently outperformed all other systems tested on the same databases.


Holographic Recurrent Networks

Neural Information Processing Systems

Holographic Recurrent Networks (HRNs) are recurrent networks which incorporate associative memory techniques for storing sequential structure. HRNs can be easily and quickly trained using gradient descent techniques to generate sequences of discrete outputs and trajectories through continuous spaee. The performance of HRNs is found to be superior to that of ordinary recurrent networks on these sequence generation tasks.


A Boundary Hunting Radial Basis Function Classifier which Allocates Centers Constructively

Neural Information Processing Systems

A new boundary hunting radial basis function (BH-RBF) classifier which allocates RBF centers constructively near class boundaries is described. This classifier creates complex decision boundaries only in regions where confusions occur and corresponding RBF outputs are similar. A predicted square error measure is used to determine how many centers to add and to determine when to stop adding centers. Two experiments are presented which demonstrate the advantages of the BH RBF classifier. One uses artificial data with two classes and two input features where each class contains four clusters but only one cluster is near a decision region boundary.


A Parallel Gradient Descent Method for Learning in Analog VLSI Neural Networks

Neural Information Processing Systems

Typical methods for gradient descent in neural network learning involve calculation of derivatives based on a detailed knowledge of the network model. This requires extensive, time consuming calculations for each pattern presentation and high precision that makes it difficult to implement in VLSI. We present here a perturbation technique that measures, not calculates, the gradient. Since the technique uses the actual network as a measuring device, errors in modeling neuron activation and synaptic weights do not cause errors in gradient descent. The method is parallel in nature and easy to implement in VLSI. We describe the theory of such an algorithm, an analysis of its domain of applicability, some simulations using it and an outline of a hardware implementation.


Analog VLSI Implementation of Multi-dimensional Gradient Descent

Neural Information Processing Systems

The implementation uses noise injection and multiplicative correlation to estimate derivatives, as in [Anderson, Kerns 92]. One intended application of this technique is setting circuit parameters on-chip automatically, rather than manually [Kirk 91]. Gradient descent optimization may be used to adjust synapse weights for a backpropagation or other on-chip learning implementation. The approach combines the features of continuous multidimensional gradient descent and the potential for an annealing style of optimization. We present data measured from our analog VLSI implementation. 1 Introduction This work is similar to [Anderson, Kerns 92], but represents two advances. First, we describe the extension of the technique to multiple dimensions. Second, we demonstrate an implementation of the multidimensional technique in analog VLSI, and provide results measured from the chip. Unlike previous work using noise sources in adaptive systems, we use the noise as a means of estimating the gradient of a function f(y), rather than performing an annealing process [Alspector 88]. We also estimate gr-;:dients continuously in position and time, in contrast to [Umminger 89] and [J abri 91], which utilize discrete position gradient estimates.