Plotting

Finding Structure in Reinforcement Learning

Neural Information Processing Systems

Reinforcement learning addresses the problem of learning to select actions in order to maximize one's performance in unknown environments. To scale reinforcement learning to complex real-world tasks, such as typically studied in AI, one must ultimately be able to discover the structure in the world, in order to abstract away the myriad of details and to operate in more tractable problem spaces. This paper presents the SKILLS algorithm. SKILLS discovers skills, which are partially defined action policies that arise in the context of multiple, related tasks.


Generalisation in Feedforward Networks

Neural Information Processing Systems

They provide in particular some theoretical bounds on the sample complexity, i.e. a minimal number of training samples assuring the desired accuracy with the desired confidence. However there are a few obvious deficiencies in these results: (i) the sample complexity bounds are unrealistically high (c.f. Section 4.), and (ii) for some networks they do not hold at all since VC-dimension is infinite, e.g.


An Input Output HMM Architecture

Neural Information Processing Systems

We introduce a recurrent architecture having a modular structure and we formulate a training procedure based on the EM algorithm. The resulting model has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation. 1 INTRODUCTION Learning problems involving sequentially structured data cannot be effectively dealt with static models such as feedforward networks. Recurrent networks allow to model complex dynamical systems and can store and retrieve contextual information in a flexible way. Up until the present time, research efforts of supervised learning for recurrent networks have almost exclusively focused on error minimization by gradient descent methods. Although effective for learning short term memories, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals (Bengio et al., 1994; Mozer, 1992).


Coarse-to-Fine Image Search Using Neural Networks

Neural Information Processing Systems

The efficiency of image search can be greatly improved by using a coarse-to-fine search strategy with a multi-resolution image representation. However, if the resolution is so low that the objects have few distinguishing features, search becomes difficult. We show that the performance of search at such low resolutions can be improved by using context information, i.e., objects visible at low-resolution which are not the objects of interest but are associated with them. The networks can be given explicit context information as inputs, or they can learn to detect the context objects, in which case the user does not have to be aware of their existence. We also use Integrated Feature Pyramids, which represent high-frequency information at low resolutions. The use of multiresolution search techniques allows us to combine information about the appearance of the objects on many scales in an efficient way. A natural fOlm of exemplar selection also arises from these techniques. We illustrate these ideas by training hierarchical systems of neural networks to find clusters of buildings in aerial photographs of farmland.


Computational Structure of coordinate transformations: A generalization study

Neural Information Processing Systems

One of the fundamental properties that both neural networks and the central nervous system share is the ability to learn and generalize from examples. While this property has been studied extensively in the neural network literature it has not been thoroughly explored in human perceptual and motor learning. We have chosen a coordinate transformation system-the visuomotor map which transforms visual coordinates into motor coordinates-to study the generalization effects of learning new input-output pairs. Using a paradigm of computer controlled altered visual feedback, we have studied the generalization of the visuomotor map subsequent to both local and context-dependent remappings. A local remapping of one or two input-output pairs induced a significant global, yet decaying, change in the visuomotor map, suggesting a representation for the map composed of units with large functional receptive fields. Our study of context-dependent remappings indicated that a single point in visual space can be mapped to two different finger locations depending on a context variable-the starting point of the movement. Furthermore, as the context is varied there is a gradual shift between the two remappings, consistent with two visuomotor modules being learned and gated smoothly with the context. 1 Introduction The human central nervous system (CNS) receives sensory inputs from a multitude of modalities, each tuned to extract different forms of information from the 1126 Zoubin Ghahramani, Daniel M. Wolpert, Michael 1. Jordan


Transformation Invariant Autoassociation with Application to Handwritten Character Recognition

Neural Information Processing Systems

When training neural networks by the classical backpropagation algorithm the whole problem to learn must be expressed by a set of inputs and desired outputs. However, we often have high-level knowledge about the learning problem. In optical character recognition (OCR), for instance, we know that the classification should be invariant under a set of transformations like rotation or translation. We propose a new modular classification system based on several autoassociative multilayer perceptrons which allows the efficient incorporation of such knowledge. Results are reported on the NIST database of upper case handwritten letters and compared to other approaches to the invariance problem. 1 INCORPORATION OF EXPLICIT KNOWLEDGE The aim of supervised learning is to learn a mapping between the input and the output space from a set of example pairs (input, desired output). The classical implementation in the domain of neural networks is the backpropagation algorithm. If this learning set is sufficiently representative of the underlying data distributions, one hopes that after learning, the system is able to generalize correctly to other inputs of the same distribution.


Multidimensional Scaling and Data Clustering

Neural Information Processing Systems

Visualizing and structuring pairwise dissimilarity data are difficult combinatorial optimization problems known as multidimensional scaling or pairwise data clustering. Algorithms for embedding dissimilarity data set in a Euclidian space, for clustering these data and for actively selecting data to support the clustering process are discussed in the maximum entropy framework. Active data selection provides a strategy to discover structure in a data set efficiently with partially unknown data. 1 Introduction Grouping experimental data into compact clusters arises as a data analysis problem in psychology, linguistics, genetics and other experimental sciences. The data which are supposed to be clustered are either given by an explicit coordinate representation (central clustering) or, in the non-metric case, they are characterized by dissimilarity values for pairs of data points (pairwise clustering). In this paper we study algorithms (i) for embedding non-metric data in a D-dimensional Euclidian space, (ii) for simultaneous clustering and embedding of non-metric data, and (iii) for active data selection to determine a particular cluster structure with minimal number of data queries. All algorithms are derived from the maximum entropy principle (Hertz et al., 1991) which guarantees robust statistics (Tikochinsky et al., 1984).


Implementation of Neural Hardware with the Neural VLSI of URAN in Applications with Reduced Representations

Neural Information Processing Systems

This paper describes a way of neural hardware implementation with the analog-digital mixed mode neural chip. The full custom neural VLSI of Universally Reconstructible Artificial Neural network (URAN) is used to implement Korean speech recognition system. A multi-layer perceptron with linear neurons is trained successfully under the limited accuracy in computations. The network with a large frame input layer is tested to recognize spoken korean words at a forward retrieval. Multichip hardware module is suggested with eight chips or more for the extended performance and capacity.


Comparing the prediction accuracy of artificial neural networks and other statistical models for breast cancer survival

Neural Information Processing Systems

The TNM staging system has been used since the early 1960's to predict breast cancer patient outcome. In an attempt to increase prognostic accuracy, many putative prognostic factors have been identified. Because the TNM stage model can not accommodate these new factors, the proliferation of factors in breast cancer has lead to clinical confusion. What is required is a new computerized prognostic system that can test putative prognostic factors and integrate the predictive factors with the TNM variables in order to increase prognostic accuracy. Using the area under the curve of the receiver operating characteristic, we compare the accuracy of the following predictive models in terms of five year breast cancer-specific survival: pTNM staging system, principal component analysis, classification and regression trees, logistic regression, cascade correlation neural network, conjugate gradient descent neural, probabilistic neural network, and backpropagation neural network. Several statistical models are significantly more ac- 1064 Harry B. Burke, David B. Rosen, Philip H. Goodman


Bias, Variance and the Combination of Least Squares Estimators

Neural Information Processing Systems

We consider the effect of combining several least squares estimators on the expected performance of a regression problem. Computing the exact bias and variance curves as a function of the sample size we are able to quantitatively compare the effect of the combination on the bias and variance separately, and thus on the expected error which is the sum of the two. Our exact calculations, demonstrate that the combination of estimators is particularly useful in the case where the data set is small and noisy and the function to be learned is unrealizable. For large data sets the single estimator produces superior results. Finally, we show that by splitting the data set into several independent parts and training each estimator on a different subset, the performance can in some cases be significantly improved.