Goto

Collaborating Authors

 Statistical Learning


Self-Organizing Rules for Robust Principal Component Analysis

Neural Information Processing Systems

Principal Component Analysis (PCA) is an essential technique for data compression and feature extraction, and has been widely used in statistical data analysis, communication theory, pattern recognition and image processing. In the neural network literature, a lot of studies have been made on learning rules for implementing PCA or on networks closely related to PCA (see Xu & Yuille, 1993 for a detailed reference list which contains more than 30 papers related to these issues).


Remote Sensing Image Analysis via a Texture Classification Neural Network

Neural Information Processing Systems

In this work we apply a texture classification network to remote sensing image analysis. The goal is to extract the characteristics of the area depicted in the input image, thus achieving a segmented map of the region. We have recently proposed a combined neural network and rule-based framework for texture recognition. The framework uses unsupervised and supervised learning, and provides probability estimates for the output classes. We describe the texture classification network and extend it to demonstrate its application to the Landsat and Aerial image analysis domain. 1 INTRODUCTION In this work we apply a texture classification network to remote sensing image analysis. The goal is to segment the input image into homogeneous textured regions and identify each region as one of a prelearned library of textures, e.g.


Some Solutions to the Missing Feature Problem in Vision

Neural Information Processing Systems

In visual processing the ability to deal with missing and noisy information is crucial. Occlusions and unreliable feature detectors often lead to situations where little or no direct information about features is available. However the available information is usually sufficient to highly constrain the outputs. We discuss Bayesian techniques for extracting class probabilities given partial data. The optimal solution involves integrating over the missing dimensions weighted by the local probability densities. We show how to obtain closed-form approximations to the Bayesian solution using Gaussian basis function networks. The framework extends naturally to the case of noisy features.


Extended Regularization Methods for Nonconvergent Model Selection

Neural Information Processing Systems

Many techniques for model selection in the field of neural networks correspond to well established statistical methods. The method of'stopped training', on the other hand, in which an oversized network is trained until the error on a further validation set of examples deteriorates, then training is stopped, is a true innovation, since model selection doesn't require convergence of the training process. In this paper we show that this performance can be significantly enhanced by extending the'non convergent model selection method' of stopped training to include dynamic topology modifications (dynamic weight pruning) and modified complexity penalty term methods in which the weighting of the penalty term is adjusted during the training process. 1 INTRODUCTION One of the central topics in the field of neural networks is that of model selection. Both the theoretical and practical side of this have been intensively investigated and a vast array of methods have been suggested to perform this task. A widely used class of techniques starts by choosing an'oversized' network architecture then either removing redundant elements based on some measure of saliency (pruning), adding a further term to the cost function penalizing complexity (penalty terms), and finally, observing the error on a further validation set of examples, then stopping training as soon as this performance begins to deteriorate (stopped training).


Assessing and Improving Neural Network Predictions by the Bootstrap Algorithm

Neural Information Processing Systems

The bootstrap method offers an computation intensive alternative to estimate the predictive distribution for a neural network even if the analytic derivation is intractable. The available asymptotic results show that it is valid for a large number of linear, nonlinear and even nonparametric regression problems. It has the potential to model the distribution of estimators to a higher precision than the usual normal asymptotics. It even may be valid if the normal asymptotics fail. However, the theoretical properties of bootstrap procedures for neural networks - especially nonlinear models - have to be investigated more comprehensively.


Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors

Neural Information Processing Systems

We propose a very simple, and well principled way of computing the optimal step size in gradient descent algorithms. The online version is very efficient computationally, and is applicable to large backpropagation networks trained on large data sets. The main ingredient is a technique for estimating the principal eigenvalue(s) and eigenvector(s) of the objective function's second derivative matrix (Hessian), which does not require to even calculate the Hessian. Several other applications of this technique are proposed for speeding up learning, or for eliminating useless parameters. 1 INTRODUCTION Choosing the appropriate learning rate, or step size, in a gradient descent procedure such as backpropagation, is simultaneously one of the most crucial and expertintensive part of neural-network learning. We propose a method for computing the best step size which is both well-principled, simple, very cheap computationally, and, most of all, applicable to online training with large networks and data sets.


Holographic Recurrent Networks

Neural Information Processing Systems

Holographic Recurrent Networks (HRNs) are recurrent networks which incorporate associative memory techniques for storing sequential structure. HRNs can be easily and quickly trained using gradient descent techniques to generate sequences of discrete outputs and trajectories through continuous spaee. The performance of HRNs is found to be superior to that of ordinary recurrent networks on these sequence generation tasks.


Weight Space Probability Densities in Stochastic Learning: II. Transients and Basin Hopping Times

Neural Information Processing Systems

Genevieve B. Orr and Todd K. Leen Department of Computer Science and Engineering Oregon Graduate Institute of Science & Technology 19600 N.W. von Neumann Drive Beaverton, OR 97006-1999 Abstract In stochastic learning, weights are random variables whose time evolution is governed by a Markov process. We summarize the theory of the time evolution of P, and give graphical examples of the time evolution that contrast the behavior of stochastic learning with true gradient descent (batch learning). Finally, we use the formalism to obtain predictions of the time required for noise-induced hopping between basins of different optima. We compare the theoretical predictions with simulations of large ensembles of networks for simple problems in supervised and unsupervised learning. Despite the recent application of convergence theorems from stochastic approximation theoryto neural network learning (Oja 1982, White 1989) there remain outstanding questionsabout the search dynamics in stochastic learning.


Assessing and Improving Neural Network Predictions by the Bootstrap Algorithm

Neural Information Processing Systems

The bootstrap method offers an computation intensive alternative to estimate the predictive distribution for a neural network even if the analytic derivation is intractable. Theavailable asymptotic results show that it is valid for a large number of linear, nonlinear and even nonparametric regression problems. It has the potential tomodel the distribution of estimators to a higher precision than the usual normal asymptotics. It even may be valid if the normal asymptotics fail. However, the theoretical properties of bootstrap procedures for neural networks - especially nonlinear models - have to be investigated more comprehensively.


An Analog VLSI Chip for Radial Basis Functions

Neural Information Processing Systems

We have designed, fabricated, and tested an analog VLSI chip which computes radial basis functions in parallel. We have developed asynapse circuit that approximates a quadratic function. We aggregate these circuits to form radial basis functions. These radial basis functions are then averaged together using a follower aggregator. 1 INTRODUCTION Radial basis functions (RBFs) are a mel hod for approximating a function from scattered training points [Powell, H)87]. RBFs have been used to solve recognition and prediction problems with a fair amonnt of success [Lee, 1991] [Moody, 1989] [Platt, 1991]. The first layer of an RBF network computes t.he distance of the input to the network to a set of stored memories. Each basis function is a nonlinear function of a corresponding distance. Tht basis functions are then added together with second-layer weights to produce the output of the network.