Goto

Collaborating Authors

 Information Technology


Model Matching and SFMD Computation

Neural Information Processing Systems

In systems that process sensory data there is frequently a model matching stage where class hypotheses are combined to recognize a complex entity. We introduce a new model of parallelism, the Single Function Multiple Data (SFMD) model, appropriate to this stage. SFMD functionality can be added with small hardware expense to certain existing SIMD architectures, and as an incremental addition to the programming model. Adding SFMD to an SIMD machine will not only allow faster model matching, but also increase its flexibility as a general purpose machine and its scope in performing the initial stages of sensory processing. 1 INTRODUCTION In systems that process sensory data there is frequently a post-classification stage where several independent class hypotheses are combined into the recognition of a more complex entity. Examples include matching word models with a string of observation probabilities, and matching visual object models with collections of edges or other features. Current parallel computer architectures for processing sensory data focus on the classification and pre-classification stages (Hammerstrom 1990).This is reasonable, as those stages likely have the largest potential for speedup through parallel execution. Nonetheless, the model-matching stage is also suitable for parallelism, as each model may be matched independently of the others. We introduce a new style of parallelism, Single Function Multiple Data (SFMD), that is suitable for the model-matching stage.



Neuron-MOS Temporal Winner Search Hardware for Fully-Parallel Data Processing

Neural Information Processing Systems

Search for the largest (or the smallest) among a number of input data, Le., the winner-take-all (WTA) action, is an essential part of intelligent data processing such as data retrieval in associative memories [3], vector quantization circuits [4], Kohonen's self-organizing maps [5] etc. In addition to the maximum or minimum search, data sorting also plays an essential role in a number of signal processing such as median filtering in image processing, evolutionary algorithms in optimizing problems [6] and so forth.


The Gamma MLP for Speech Phoneme Recognition

Neural Information Processing Systems

Department of Electrical and Computer Engineering University of Queensland St. Lucia Qld 4072 Australia Abstract We define a Gamma multi-layer perceptron (MLP) as an MLP with the usual synaptic weights replaced by gamma filters (as proposed byde Vries and Principe (de Vries and Principe, 1992)) and associated gain terms throughout all layers. We derive gradient descent update equations and apply the model to the recognition of speech phonemes. We find that both the inclusion of gamma filters in all layers, and the inclusion of synaptic gains, improves the performance of the Gamma MLP. We compare the Gamma MLP with TDNN, Back-Tsoi FIR MLP, and Back-Tsoi I1R MLP architectures, and a local approximation scheme. We find that the Gamma MLP results in an substantial reduction in error rates. 1 INTRODUCTION 1.1 THE GAMMA FILTER Infinite Impulse Response (I1R) filters have a significant advantage over Finite Impulse Response(FIR) filters in signal processing: the length of the impulse response is uncoupled from the number of filter parameters.


Using Pairs of Data-Points to Define Splits for Decision Trees

Neural Information Processing Systems

CART either split the data using axis-aligned hyperplanes or they perform a computationally expensivesearch in the continuous space of hyperplanes with unrestricted orientations. We show that the limitations of the former can be overcome without resorting to the latter. For every pair of training data-points, there is one hyperplane that is orthogonal tothe line joining the data-points and bisects this line. Such hyperplanes are plausible candidates for splits. In a comparison on a suite of 12 datasets we found that this method of generating candidate splits outperformed the standard methods, particularly when the training sets were small. 1 Introduction Binary decision trees come in many flavours, but they all rely on splitting the set of k-dimensional data-points at each internal node into two disjoint sets.


Forward-backward retraining of recurrent neural networks

Neural Information Processing Systems

This paper describes the training of a recurrent neural network as the letter posterior probability estimator for a hidden Markov model, off-line handwriting recognition system. The network estimates posteriordistributions for each of a series of frames representing sectionsof a handwritten word. The supervised training algorithm, backpropagation through time, requires target outputs to be provided for each frame. Three methods for deriving these targets are presented. A novel method based upon the forwardbackward algorithmis found to result in the recognizer with the lowest error rate. 1 Introduction In the field of off-line handwriting recognition, the goal is to read a handwritten document and produce a machine transcription.


Human Face Detection in Visual Scenes

Neural Information Processing Systems

We present a neural network-based face detection system. A retinally connected neural network examines small windows of an image, and decides whether each window contains a face. The system arbitrates between multiple networks to improve performance over a single network. We use a bootstrap algorithm for training, which adds false detections into the training set as training progresses. This eliminates the difficult task of manually selecting non-face training examples, which must be chosen to span the entire space of non-face images.


Improved Gaussian Mixture Density Estimates Using Bayesian Penalty Terms and Network Averaging

Neural Information Processing Systems

We compare two regularization methods which can be used to improve thegeneralization capabilities of Gaussian mixture density estimates. The first method uses a Bayesian prior on the parameter space.We derive EM (Expectation Maximization) update rules which maximize the a posterior parameter probability. In the second approachwe apply ensemble averaging to density estimation. This includes Breiman's "bagging", which recently has been found to produce impressive results for classification networks.


Information through a Spiking Neuron

Neural Information Processing Systems

While it is generally agreed that neurons transmit information about their synaptic inputs through spike trains, the code by which this information is transmitted is not well understood. An upper bound on the information encoded is obtained by hypothesizing that the precise timing of each spike conveys information. Here we develop a general approach to quantifying the information carried by spike trains under this hypothesis, and apply it to the leaky integrate-and-fire (IF) model of neuronal dynamics. We formulate theproblem in terms of the probability distribution peT) of interspike intervals (ISIs), assuming that spikes are detected with arbitrary but finite temporal resolution. In the absence of added noise, all the variability in the ISIs could encode information, and the information rate is simply the entropy of the lSI distribution, H (T) (-p(T) log2 p(T)}, times the spike rate.


Context-Dependent Classes in a Hybrid Recurrent Network-HMM Speech Recognition System

Neural Information Processing Systems

A method for incorporating context-dependent phone classes in a connectionist-HMM hybrid speech recognition system is introduced. Amodular approach is adopted, where single-layer networks discriminate between different context classes given the phone class and the acoustic data. The context networks are combined with a context-independent (CI) network to generate context-dependent (CD) phone probability estimates. Experiments show an average reduction in word error rate of 16% and 13% from the CI system on ARPA 5,000 word and SQALE 20,000 word tasks respectively. Due to improved modelling, the decoding speed of the CD system is more than twice as fast as the CI system.