An Input Output HMM Architecture

Neural Information Processing Systems

We introduce a recurrent architecture having a modular structure and we formulate a training procedure based on the EM algorithm. The resulting model has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation. 1 INTRODUCTION Learning problems involving sequentially structured data cannot be effectively dealt with static models such as feedforward networks. Recurrent networks allow to model complex dynamical systems and can store and retrieve contextual information in a flexible way. Up until the present time, research efforts of supervised learning for recurrent networks have almost exclusively focused on error minimization by gradient descent methods. Although effective for learning short term memories, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals (Bengio et al., 1994; Mozer, 1992).


Coarse-to-Fine Image Search Using Neural Networks

Neural Information Processing Systems

The efficiency of image search can be greatly improved by using a coarse-to-fine search strategy with a multi-resolution image representation. However, if the resolution is so low that the objects have few distinguishing features, search becomes difficult. We show that the performance of search at such low resolutions can be improved by using context information, i.e., objects visible at low-resolution which are not the objects of interest but are associated with them. The networks can be given explicit context information as inputs, or they can learn to detect the context objects, in which case the user does not have to be aware of their existence. We also use Integrated Feature Pyramids, which represent high-frequency information at low resolutions. The use of multiresolution search techniques allows us to combine information about the appearance of the objects on many scales in an efficient way. A natural fOlm of exemplar selection also arises from these techniques. We illustrate these ideas by training hierarchical systems of neural networks to find clusters of buildings in aerial photographs of farmland.


Computational Structure of coordinate transformations: A generalization study

Neural Information Processing Systems

One of the fundamental properties that both neural networks and the central nervous system share is the ability to learn and generalize from examples. While this property has been studied extensively in the neural network literature it has not been thoroughly explored in human perceptual and motor learning. We have chosen a coordinate transformation system-the visuomotor map which transforms visual coordinates into motor coordinates-to study the generalization effects of learning new input-output pairs. Using a paradigm of computer controlled altered visual feedback, we have studied the generalization of the visuomotor map subsequent to both local and context-dependent remappings. A local remapping of one or two input-output pairs induced a significant global, yet decaying, change in the visuomotor map, suggesting a representation for the map composed of units with large functional receptive fields. Our study of context-dependent remappings indicated that a single point in visual space can be mapped to two different finger locations depending on a context variable-the starting point of the movement. Furthermore, as the context is varied there is a gradual shift between the two remappings, consistent with two visuomotor modules being learned and gated smoothly with the context. 1 Introduction The human central nervous system (CNS) receives sensory inputs from a multitude of modalities, each tuned to extract different forms of information from the 1126 Zoubin Ghahramani, Daniel M. Wolpert, Michael 1. Jordan


Transformation Invariant Autoassociation with Application to Handwritten Character Recognition

Neural Information Processing Systems

When training neural networks by the classical backpropagation algorithm the whole problem to learn must be expressed by a set of inputs and desired outputs. However, we often have high-level knowledge about the learning problem. In optical character recognition (OCR), for instance, we know that the classification should be invariant under a set of transformations like rotation or translation. We propose a new modular classification system based on several autoassociative multilayer perceptrons which allows the efficient incorporation of such knowledge. Results are reported on the NIST database of upper case handwritten letters and compared to other approaches to the invariance problem. 1 INCORPORATION OF EXPLICIT KNOWLEDGE The aim of supervised learning is to learn a mapping between the input and the output space from a set of example pairs (input, desired output). The classical implementation in the domain of neural networks is the backpropagation algorithm. If this learning set is sufficiently representative of the underlying data distributions, one hopes that after learning, the system is able to generalize correctly to other inputs of the same distribution.


Multidimensional Scaling and Data Clustering

Neural Information Processing Systems

Visualizing and structuring pairwise dissimilarity data are difficult combinatorial optimization problems known as multidimensional scaling or pairwise data clustering. Algorithms for embedding dissimilarity data set in a Euclidian space, for clustering these data and for actively selecting data to support the clustering process are discussed in the maximum entropy framework. Active data selection provides a strategy to discover structure in a data set efficiently with partially unknown data. 1 Introduction Grouping experimental data into compact clusters arises as a data analysis problem in psychology, linguistics, genetics and other experimental sciences. The data which are supposed to be clustered are either given by an explicit coordinate representation (central clustering) or, in the non-metric case, they are characterized by dissimilarity values for pairs of data points (pairwise clustering). In this paper we study algorithms (i) for embedding non-metric data in a D-dimensional Euclidian space, (ii) for simultaneous clustering and embedding of non-metric data, and (iii) for active data selection to determine a particular cluster structure with minimal number of data queries. All algorithms are derived from the maximum entropy principle (Hertz et al., 1991) which guarantees robust statistics (Tikochinsky et al., 1984).


Implementation of Neural Hardware with the Neural VLSI of URAN in Applications with Reduced Representations

Neural Information Processing Systems

This paper describes a way of neural hardware implementation with the analog-digital mixed mode neural chip. The full custom neural VLSI of Universally Reconstructible Artificial Neural network (URAN) is used to implement Korean speech recognition system. A multi-layer perceptron with linear neurons is trained successfully under the limited accuracy in computations. The network with a large frame input layer is tested to recognize spoken korean words at a forward retrieval. Multichip hardware module is suggested with eight chips or more for the extended performance and capacity.


Comparing the prediction accuracy of artificial neural networks and other statistical models for breast cancer survival

Neural Information Processing Systems

The TNM staging system has been used since the early 1960's to predict breast cancer patient outcome. In an attempt to increase prognostic accuracy, many putative prognostic factors have been identified. Because the TNM stage model can not accommodate these new factors, the proliferation of factors in breast cancer has lead to clinical confusion. What is required is a new computerized prognostic system that can test putative prognostic factors and integrate the predictive factors with the TNM variables in order to increase prognostic accuracy. Using the area under the curve of the receiver operating characteristic, we compare the accuracy of the following predictive models in terms of five year breast cancer-specific survival: pTNM staging system, principal component analysis, classification and regression trees, logistic regression, cascade correlation neural network, conjugate gradient descent neural, probabilistic neural network, and backpropagation neural network. Several statistical models are significantly more ac- 1064 Harry B. Burke, David B. Rosen, Philip H. Goodman


Bias, Variance and the Combination of Least Squares Estimators

Neural Information Processing Systems

We consider the effect of combining several least squares estimators on the expected performance of a regression problem. Computing the exact bias and variance curves as a function of the sample size we are able to quantitatively compare the effect of the combination on the bias and variance separately, and thus on the expected error which is the sum of the two. Our exact calculations, demonstrate that the combination of estimators is particularly useful in the case where the data set is small and noisy and the function to be learned is unrealizable. For large data sets the single estimator produces superior results. Finally, we show that by splitting the data set into several independent parts and training each estimator on a different subset, the performance can in some cases be significantly improved.


Model of a Biological Neuron as a Temporal Neural Network

Neural Information Processing Systems

A biological neuron can be viewed as a device that maps a multidimensional temporal event signal (dendritic postsynaptic activations) into a unidimensional temporal event signal (action potentials). We have designed a network, the Spatio-Temporal Event Mapping (STEM) architecture, which can learn to perform this mapping for arbitrary biophysical models of neurons. Such a network appropriately trained, called a STEM cell, can be used in place of a conventional compartmental model in simulations where only the transfer function is important, such as network simulations. The STEM cell offers advantages over compartmental models in terms of computational efficiency, analytical tractabili1ty, and as a framework for VLSI implementations of biological neurons.


Grammar Learning by a Self-Organizing Network

Neural Information Processing Systems

Michiro Negishi Dept. of Cognitive and Neural Systems, Boston University 111 Cummington Street Boston, MA 02215 email: negishi@cns.bu.edu Abstract This paper presents the design and simulation results of a selforganizing neural network which induces a grammar from example sentences. Input sentences are generated from a simple phrase structure grammar including number agreement, verb transitivity, and recursive noun phrase construction rules. The network induces a grammar explicitly in the form of symbol categorization rules and phrase structure rules. 1 Purpose and related works The purpose of this research is to show that a self-organizing network with a certain structure can acquire syntactic knowledge from only positive (i.e. There has been research on supervised neural network models of language acquisition tasks [Elman, 1991, Miikkulainen and Dyer, 1988, John and McClelland, 1988]. Unlike these supervised models, the current model self-organizes word and phrasal categories and phrase construction rules through mere exposure to input sentences, without any artificially defined task goals.