AITopics

Since a neural network predictor inherently has an excessive number of parameters, reducing the prediction error is usually done by reducing variance. Methods for reducing neural network complexity can be viewed as a regularization technique to reduce this variance. Examples of such methods are Optimal Brain Damage (Le Cun et.

health & medicine, neural network, training signal, (19 more...)

Country: North America > United States (0.15)

Industry: Health & Medicine (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Duff, Michael O., Barto, Andrew G.

Local Bandit Approximation for Optimal Learning Problems

A Bayesian formulation of the problem leads to a clear concept of a solution whose computation, however, appears to entail an examination of an intractably-large number of hyperstates. This paper has suggested extending the Gittins index approach (which applies with great power and elegance to the special class of multi-armed bandit processes) to general adaptive MDP's. The hope has been that if certain salient features of the value of information could be captured, even approximately, then one could be led to a reasonable method for avoiding certain defects of certainty-equivalence approaches (problems with identifiability, "metastability"). Obviously, positive evidence, in the form of empirical results from simulation experiments, would lend support to these ideas-work along these lines is underway. Local bandit approximation is but one approximate computational approach for problems of optimal learning and dual control. Most prominent in the literature of control theory is the "wide-sense" approach of [Bar-Shalom & Tse, 1976], which utilizes local quadratic approximations about nominal state/control trajectories. For certain problems, this method has demonstrated superior performance compared to a certainty-equivalence approach, but it is computationally very intensive and unwieldy, particularly for problems with controller dimension greater than one. One could revert to the view of the bandit problem, or general adaptive MDP, as simply a very large MDP defined over hyperstates, and then consider a some- Local Bandit Approximationfor Optimal Learning Problems 1025 what direct approach in which one performs approximate dynamic programming with function approximation over this domain-details of function-approximation, feature-selection, and "training" all become important design issues.

artificial intelligence, big data, transition probability, (17 more...)

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Genre: Research Report (0.48)

Industry: Education > Focused Education > Special Education (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.86)
Information Technology > Data Science > Data Mining > Big Data (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.70)
(2 more...)

Tresp, Volker, Neuneier, Ralph, Zimmermann, Hans-Georg

Early Brain Damage

Optimal Brain Damage (OBD) is a method for reducing the number of weights in a neural network. OBD estimates the increase in cost function if weights are pruned and is a valid approximation if the learning algorithm has converged into a local minimum. On the other hand it is often desirable to terminate the learning process before a local minimum is reached (early stopping). In this paper we show that OBD estimates the increase in cost function incorrectly if the network is not in a local minimum. We also show how OBD can be extended such that it can be used in connection with early stopping. We call this new approach Early Brain Damage, EBD. EBD also allows to revive already pruned weights. We demonstrate the improvements achieved by EBD using three publicly available data sets.

cost function, health & medicine, neurology, (18 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Pollack, Jordan B., Blair, Alan D.

Why did TD-Gammon Work?

Although TD-Gammon is one of the major successes in machine learning, it has not led to similar impressive breakthroughs in temporal difference We werelearning for other applications or even other games. Instead we apply simple hill-climbing in a relative fitness environment. These results and further analysis suggest of Tesauro's program had more to do with thethat the surprising success of the learning task and the dynamics of theco-evolutionary structure backgammon game itself. 1 INTRODUCTION It took great chutzpah for Gerald Tesauro to start wasting computer cycles on temporal of Backgammon (Tesauro, 1992). After all, the dream ofprogram play itself in the hopes computers mastering a domain by self-play or "introspection" had been around since the early days of AI, forming part of Samuel's checker player (Samuel, 1959) and used in Donald Michie's MENACE tictac-toe learner (Michie, 1961). However such self-conditioning or nonexistent internal representations, had generally beensystems, with weak of scale and abandoned by the field of AI.

artificial intelligence, backgammon, tesauro, (16 more...)

Industry: Leisure & Entertainment > Games > Backgammon (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Games > Backgammon (1.00)

Caruana, Rich, Sa, Virginia R. de

Promoting Poor Features to Supervisors: Some Inputs Work Better as Outputs

In supervised learning there is usually a clear distinction between inputs and outputs - inputs are what you will measure, outputs are what you will predict from those measurements. This paper shows that the distinction between inputs and outputs is not this Some features are more useful as extra outputs than assimple. By using a feature as an output we get more than just the case values but can. For many features this mapping may be more useful than the feature value itself. We present two regression problems and one classification problem where performance improves if features that could have been used as inputs are used as extra outputs instead.

health & medicine, neural network, std, (17 more...)

Country: North America > United States > California > San Francisco County > San Francisco (0.28)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)

Horiuchi, Timothy K., Morris, Tonia G., Koch, Christof, DeWeerth, Stephen P.

Analog VLSI Circuits for Attention-Based, Visual Tracking

A one-dimensional visual tracking chip has been implemented using neuromorphic,analog VLSI techniques to model selective visual attention in the control of saccadic and smooth pursuit eye movements. Thechip incorporates focal-plane processing to compute image saliency and a winner-take-all circuit to select a feature for tracking. The target position and direction of motion are reported as the target moves across the array. We demonstrate its functionality ina closed-loop system which performs saccadic and smooth pursuit tracking movements using a one-dimensional mechanical eye. 1 Introduction Tracking a moving object on a cluttered background is a difficult task. When more than one target is in the field of view, a decision must be made to determine which target to track and what its movement characteristics are.

analog vlsi circuit, artificial intelligence, attention-based, (12 more...)

Country: North America > United States > California (0.16)

Industry: Semiconductors & Electronics (0.66)

Technology: Information Technology > Artificial Intelligence (0.47)

Papka, Ron, Callan, James P., Barto, Andrew G.

Text-Based Information Retrieval Using Exponentiated Gradient Descent

The following investigates the use of single-neuron learning algorithms to improve the performance of text-retrieval systems that accept natural-language queries. A retrieval process is explained that transforms the natural-language query into the query syntax of a real retrieval system: the initial query is expanded using statistical and learning techniques and is then used for document ranking and binary classification. The results of experiments suggest that Kivinen and Warmuth's Exponentiated Gradient Descent learning algorithm works significantly better than previous approaches. 1 Introduction The following work explores two learning algorithms - Least Mean Squared (LMS) [1] and Exponentiated Gradient Descent (EG) [2] - in the context of text-based Information Retrieval (IR) systems. The experiments presented in [3] use connectionist to improve the retrieval of relevant documents from a largelearning models collection of text. Previous the area employs various techniques for improving retrieval [6, 7, 14].

artificial intelligence, natural language, query, (18 more...)

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Genre: Research Report (0.67)

Industry: Government > Regional Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.82)

Ordered Classes and Incomplete Examples in Classification

Mathieson, Mark

The classes in classification tasks often have a natural ordering, and the training and testing examples are often incomplete. We propose a nonlinear ordinalmodel for classification into ordered classes. Predictive, simulation-based approaches are used to learn from past and classify future incompleteexamples. These techniques are illustrated by making prognoses for patients who have suffered severe head injuries.

bayesian inference, imputation, neurology, (17 more...)

Country: Europe > United Kingdom > England (0.14)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.70)

Kowalczyk, Adam, Ferrá, Herman L.

MLP Can Provably Generalize Much Better than VC-bounds Indicate

It is also shown that bounds following the true learning curve can be derived from a formalism based on the density of error patterns.

artificial intelligence, neural network, thermodynamic limit, (19 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.33)

Saad, David, Solla, Sara A.

Learning with Noise and Regularizers in Multilayer Neural Networks

We study the effect of noise and regularization in an online gradient-descent learning scenario for a general two-layer student network with an arbitrary number of hidden units. Training examples arerandomly drawn input vectors labeled by a two-layer teacher network with an arbitrary number of hidden units; the examples arecorrupted by Gaussian noise affecting either the output or the model itself. We examine the effect of both types of noise and that of weight-decay regularization on the dynamical evolution ofthe order parameters and the generalization error in various phases of the learning process. 1 Introduction One of the most powerful and commonly used methods for training large layered neural networks is that of online learning, whereby the internal network parameters {J} are modified after the presentation of each training example so as to minimize the corresponding error.

computer based training, educational technology, generalization error, (19 more...)

Country: Europe > United Kingdom (0.14)

Industry: Education (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)