Plotting

Early Brain Damage

Neural Information Processing Systems

Optimal Brain Damage (OBD) is a method for reducing the number of weights in a neural network. OBD estimates the increase in cost function if weights are pruned and is a valid approximation if the learning algorithm has converged into a local minimum. On the other hand it is often desirable to terminate the learning process before a local minimum is reached (early stopping). In this paper we show that OBD estimates the increase in cost function incorrectly if the network is not in a local minimum. We also show how OBD can be extended such that it can be used in connection with early stopping. We call this new approach Early Brain Damage, EBD. EBD also allows to revive already pruned weights. We demonstrate the improvements achieved by EBD using three publicly available data sets.


Why did TD-Gammon Work?

Neural Information Processing Systems

Although TD-Gammon is one of the major successes in machine learning, it has not led to similar impressive breakthroughs in temporal difference We werelearning for other applications or even other games. Instead we apply simple hill-climbing in a relative fitness environment. These results and further analysis suggest of Tesauro's program had more to do with thethat the surprising success of the learning task and the dynamics of theco-evolutionary structure backgammon game itself. 1 INTRODUCTION It took great chutzpah for Gerald Tesauro to start wasting computer cycles on temporal of Backgammon (Tesauro, 1992). After all, the dream ofprogram play itself in the hopes computers mastering a domain by self-play or "introspection" had been around since the early days of AI, forming part of Samuel's checker player (Samuel, 1959) and used in Donald Michie's MENACE tictac-toe learner (Michie, 1961). However such self-conditioning or nonexistent internal representations, had generally beensystems, with weak of scale and abandoned by the field of AI.


Promoting Poor Features to Supervisors: Some Inputs Work Better as Outputs

Neural Information Processing Systems

In supervised learning there is usually a clear distinction between inputs and outputs - inputs are what you will measure, outputs are what you will predict from those measurements. This paper shows that the distinction between inputs and outputs is not this Some features are more useful as extra outputs than assimple. By using a feature as an output we get more than just the case values but can. For many features this mapping may be more useful than the feature value itself. We present two regression problems and one classification problem where performance improves if features that could have been used as inputs are used as extra outputs instead.


Analog VLSI Circuits for Attention-Based, Visual Tracking

Neural Information Processing Systems

A one-dimensional visual tracking chip has been implemented using neuromorphic,analog VLSI techniques to model selective visual attention in the control of saccadic and smooth pursuit eye movements. Thechip incorporates focal-plane processing to compute image saliency and a winner-take-all circuit to select a feature for tracking. The target position and direction of motion are reported as the target moves across the array. We demonstrate its functionality ina closed-loop system which performs saccadic and smooth pursuit tracking movements using a one-dimensional mechanical eye. 1 Introduction Tracking a moving object on a cluttered background is a difficult task. When more than one target is in the field of view, a decision must be made to determine which target to track and what its movement characteristics are.


Text-Based Information Retrieval Using Exponentiated Gradient Descent

Neural Information Processing Systems

The following investigates the use of single-neuron learning algorithms to improve the performance of text-retrieval systems that accept natural-language queries. A retrieval process is explained that transforms the natural-language query into the query syntax of a real retrieval system: the initial query is expanded using statistical and learning techniques and is then used for document ranking and binary classification. The results of experiments suggest that Kivinen and Warmuth's Exponentiated Gradient Descent learning algorithm works significantly better than previous approaches. 1 Introduction The following work explores two learning algorithms - Least Mean Squared (LMS) [1] and Exponentiated Gradient Descent (EG) [2] - in the context of text-based Information Retrieval (IR) systems. The experiments presented in [3] use connectionist to improve the retrieval of relevant documents from a largelearning models collection of text. Previous the area employs various techniques for improving retrieval [6, 7, 14].


Ordered Classes and Incomplete Examples in Classification

Neural Information Processing Systems

The classes in classification tasks often have a natural ordering, and the training and testing examples are often incomplete. We propose a nonlinear ordinalmodel for classification into ordered classes. Predictive, simulation-based approaches are used to learn from past and classify future incompleteexamples. These techniques are illustrated by making prognoses for patients who have suffered severe head injuries.



Learning with Noise and Regularizers in Multilayer Neural Networks

Neural Information Processing Systems

We study the effect of noise and regularization in an online gradient-descent learning scenario for a general two-layer student network with an arbitrary number of hidden units. Training examples arerandomly drawn input vectors labeled by a two-layer teacher network with an arbitrary number of hidden units; the examples arecorrupted by Gaussian noise affecting either the output or the model itself. We examine the effect of both types of noise and that of weight-decay regularization on the dynamical evolution ofthe order parameters and the generalization error in various phases of the learning process. 1 Introduction One of the most powerful and commonly used methods for training large layered neural networks is that of online learning, whereby the internal network parameters {J} are modified after the presentation of each training example so as to minimize the corresponding error.


Spatial Decorrelation in Orientation Tuned Cortical Cells

Neural Information Processing Systems

In this paper we propose a model for the lateral connectivity of orientation-selective cells in the visual cortex based on informationtheoretic considerations.We study the properties of the input signal to the visual cortex and find new statistical structures which have not been processed in the retino-geniculate pathway. Applying the idea that the system optimizes the representation of incoming signals, we derive the lateral connectivity that will achieve this for a set of local orientation-selective patches, as well as the complete spatial structure of a layer of such patches. We compare the results with various physiological measurements.


The Learning Dynamcis of a Universal Approximator

Neural Information Processing Systems

The learning properties of a universal approximator, a normalized committee machine with adjustable biases, are studied for online back-propagation learning. Within a statistical mechanics framework, numericalstudies show that this model has features which do not exist in previously studied two-layer network models without adjustablebiases, e.g., attractive suboptimal symmetric phases even for realizable cases and noiseless data. 1 INTRODUCTION Recently there has been much interest in the theoretical breakthrough in the understanding ofthe online learning dynamics of multi-layer feedforward perceptrons (MLPs) using a statistical mechanics framework. In the seminal paper (Saad & Solla, 1995), a two-layer network with an arbitrary number of hidden units was studied, allowing insight into the learning behaviour of neural network models whose complexity is of the same order as those used in real world applications.