AITopics

Optimal Brain Damage (OBD) is a method for reducing the number of weights in a neural network. OBD estimates the increase in cost function if weights are pruned and is a valid approximation if the learning algorithm has converged into a local minimum. On the other hand it is often desirable to terminate the learning process before a local minimum is reached (early stopping). In this paper we show that OBD estimates the increase in cost function incorrectly if the network is not in a local minimum. We also show how OBD can be extended such that it can be used in connection with early stopping. We call this new approach Early Brain Damage, EBD. EBD also allows to revive already pruned weights. We demonstrate the improvements achieved by EBD using three publicly available data sets.

cost function, health & medicine, neurology, (18 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Pollack, Jordan B., Blair, Alan D.

Why did TD-Gammon Work?

Although TD-Gammon is one of the major successes in machine learning, it has not led to similar impressive breakthroughs in temporal difference We werelearning for other applications or even other games. Instead we apply simple hill-climbing in a relative fitness environment. These results and further analysis suggest of Tesauro's program had more to do with thethat the surprising success of the learning task and the dynamics of theco-evolutionary structure backgammon game itself. 1 INTRODUCTION It took great chutzpah for Gerald Tesauro to start wasting computer cycles on temporal of Backgammon (Tesauro, 1992). After all, the dream ofprogram play itself in the hopes computers mastering a domain by self-play or "introspection" had been around since the early days of AI, forming part of Samuel's checker player (Samuel, 1959) and used in Donald Michie's MENACE tictac-toe learner (Michie, 1961). However such self-conditioning or nonexistent internal representations, had generally beensystems, with weak of scale and abandoned by the field of AI.

artificial intelligence, backgammon, tesauro, (16 more...)

Industry: Leisure & Entertainment > Games > Backgammon (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Games > Backgammon (1.00)

Caruana, Rich, Sa, Virginia R. de

Promoting Poor Features to Supervisors: Some Inputs Work Better as Outputs

In supervised learning there is usually a clear distinction between inputs and outputs - inputs are what you will measure, outputs are what you will predict from those measurements. This paper shows that the distinction between inputs and outputs is not this Some features are more useful as extra outputs than assimple. By using a feature as an output we get more than just the case values but can. For many features this mapping may be more useful than the feature value itself. We present two regression problems and one classification problem where performance improves if features that could have been used as inputs are used as extra outputs instead.

health & medicine, neural network, std, (17 more...)

Country: North America > United States > California > San Francisco County > San Francisco (0.28)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)

Horiuchi, Timothy K., Morris, Tonia G., Koch, Christof, DeWeerth, Stephen P.

Analog VLSI Circuits for Attention-Based, Visual Tracking

A one-dimensional visual tracking chip has been implemented using neuromorphic,analog VLSI techniques to model selective visual attention in the control of saccadic and smooth pursuit eye movements. Thechip incorporates focal-plane processing to compute image saliency and a winner-take-all circuit to select a feature for tracking. The target position and direction of motion are reported as the target moves across the array. We demonstrate its functionality ina closed-loop system which performs saccadic and smooth pursuit tracking movements using a one-dimensional mechanical eye. 1 Introduction Tracking a moving object on a cluttered background is a difficult task. When more than one target is in the field of view, a decision must be made to determine which target to track and what its movement characteristics are.

analog vlsi circuit, artificial intelligence, attention-based, (12 more...)

Country: North America > United States > California (0.16)

Industry: Semiconductors & Electronics (0.66)

Technology: Information Technology > Artificial Intelligence (0.47)

Papka, Ron, Callan, James P., Barto, Andrew G.

Text-Based Information Retrieval Using Exponentiated Gradient Descent

The following investigates the use of single-neuron learning algorithms to improve the performance of text-retrieval systems that accept natural-language queries. A retrieval process is explained that transforms the natural-language query into the query syntax of a real retrieval system: the initial query is expanded using statistical and learning techniques and is then used for document ranking and binary classification. The results of experiments suggest that Kivinen and Warmuth's Exponentiated Gradient Descent learning algorithm works significantly better than previous approaches. 1 Introduction The following work explores two learning algorithms - Least Mean Squared (LMS) [1] and Exponentiated Gradient Descent (EG) [2] - in the context of text-based Information Retrieval (IR) systems. The experiments presented in [3] use connectionist to improve the retrieval of relevant documents from a largelearning models collection of text. Previous the area employs various techniques for improving retrieval [6, 7, 14].

artificial intelligence, natural language, query, (18 more...)

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Genre: Research Report (0.67)

Industry: Government > Regional Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.82)

Ordered Classes and Incomplete Examples in Classification

Mathieson, Mark

The classes in classification tasks often have a natural ordering, and the training and testing examples are often incomplete. We propose a nonlinear ordinalmodel for classification into ordered classes. Predictive, simulation-based approaches are used to learn from past and classify future incompleteexamples. These techniques are illustrated by making prognoses for patients who have suffered severe head injuries.

bayesian inference, imputation, neurology, (17 more...)

Country: Europe > United Kingdom > England (0.14)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.70)

Kowalczyk, Adam, Ferrá, Herman L.

MLP Can Provably Generalize Much Better than VC-bounds Indicate

It is also shown that bounds following the true learning curve can be derived from a formalism based on the density of error patterns.

artificial intelligence, neural network, thermodynamic limit, (19 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.33)

Saad, David, Solla, Sara A.

Learning with Noise and Regularizers in Multilayer Neural Networks

We study the effect of noise and regularization in an online gradient-descent learning scenario for a general two-layer student network with an arbitrary number of hidden units. Training examples arerandomly drawn input vectors labeled by a two-layer teacher network with an arbitrary number of hidden units; the examples arecorrupted by Gaussian noise affecting either the output or the model itself. We examine the effect of both types of noise and that of weight-decay regularization on the dynamical evolution ofthe order parameters and the generalization error in various phases of the learning process. 1 Introduction One of the most powerful and commonly used methods for training large layered neural networks is that of online learning, whereby the internal network parameters {J} are modified after the presentation of each training example so as to minimize the corresponding error.

computer based training, educational technology, generalization error, (19 more...)

Country: Europe > United Kingdom (0.14)

Industry: Education (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Dimitrov, Alexander, Cowan, Jack D.

Spatial Decorrelation in Orientation Tuned Cortical Cells

In this paper we propose a model for the lateral connectivity of orientation-selective cells in the visual cortex based on informationtheoretic considerations.We study the properties of the input signal to the visual cortex and find new statistical structures which have not been processed in the retino-geniculate pathway. Applying the idea that the system optimizes the representation of incoming signals, we derive the lateral connectivity that will achieve this for a set of local orientation-selective patches, as well as the complete spatial structure of a layer of such patches. We compare the results with various physiological measurements.

health & medicine, neurology, spatial decorrelation, (17 more...)

Country: North America > United States (0.15)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.70)

West, Ansgar H. L., Saad, David, Nabney, Ian T.

The Learning Dynamcis of a Universal Approximator

The learning properties of a universal approximator, a normalized committee machine with adjustable biases, are studied for online back-propagation learning. Within a statistical mechanics framework, numericalstudies show that this model has features which do not exist in previously studied two-layer network models without adjustablebiases, e.g., attractive suboptimal symmetric phases even for realizable cases and noiseless data. 1 INTRODUCTION Recently there has been much interest in the theoretical breakthrough in the understanding ofthe online learning dynamics of multi-layer feedforward perceptrons (MLPs) using a statistical mechanics framework. In the seminal paper (Saad & Solla, 1995), a two-layer network with an arbitrary number of hidden units was studied, allowing insight into the learning behaviour of neural network models whose complexity is of the same order as those used in real world applications.

artificial intelligence, initialization, neural network, (18 more...)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)