AITopics

One way to speed up reinforcement learning is to enable learning to happen simultaneously at multiple resolutions in space and time. This paper shows how to create a Q-Iearning managerial hierarchy in which high level managers learn how to set tasks to their submanagers who, in turn, learn how to satisfy them. Sub-managers need not initially understand their managers' commands. They simply learn to maximise their reinforcement in the context of the current command. We illustrate the system using a simple maze task.. As the system learns how to get around, satisfying commands at the multiple levels, it explores more efficiently than standard, flat, Q-Iearning and builds a more comprehensive map. 1 INTRODUCTION Straightforward reinforcement learning has been quite successful at some relatively complex tasks like playing backgammon (Tesauro, 1992).

agent, level manager, reinforcement, (15 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > California > San Mateo County > San Mateo (0.05)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Moore, Andrew W., Atkeson, Christopher G.

Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping

We present a new algorithm, Prioritized Sweeping, for efficient prediction and control of stochastic Markov systems. Incremental learning methods such as Temporal Differencing and Q-Iearning have fast real time performance. Classical methods are slower, but more accurate, because they make full use of the observations. Prioritized Sweeping aims for the best of both worlds. It uses all previous experiences both to prioritize important dynamic programming sweeps and to guide the exploration of statespace. We compare Prioritized Sweeping with other reinforcement learning schemes for a number of different stochastic optimal control problems. It successfully solves large state-space real time problems with which other methods have difficulty.

learning, prioritized sweeping, reinforcement learning, (12 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Baird, Bill, Troyer, Todd, Eeckman, Frank

Synchronization and Grammatical Inference in an Oscillating Elman Net

We have designed an architecture to span the gap between biophysics and cognitive science to address and explore issues of how a discrete symbol processing system can arise from the continuum, and how complex dynamics like oscillation and synchronization can then be employed in its operation and affect its learning. We show how a discrete-time recurrent "Elman" network architecture can be constructed from recurrently connected oscillatory associative memory modules described by continuous nonlinear ordinary differential equations. The modules can learn connection weights between themselves which will cause the system to evolve under a clocked "machine cycle" by a sequence of transitions of attractors within the modules, much as a digital computer evolves by transitions of its binary flip-flop attractors. The architecture thus employs the principle of "computing with attractors" used by macroscopic systems for reliable computation in the presence of noise. We have specifically constructed a system which functions as a finite state automaton that recognizes or generates the infinite set of six symbol strings that are defined by a Reber grammar. It is a symbol processing system, but with analog input and oscillatory subsymbolic representations. The time steps (machine cycles) of the system are implemented by rhythmic variation (clocking) of a bifurcation parameter. This holds input and "context" modules clamped at their attractors while'hidden and output modules change state, then clamps hidden and output states while context modules are released to load those states as the new context for the next cycle of input. Superior noise immunity has been demonstrated for systems with dynamic attractors over systems with static attractors, and synchronization ("binding") between coupled oscillatory attractors in different modules has been shown to be important for effecting reliable transitions.

architecture, attractor, module, (15 more...)

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Plymouth County > Norwell (0.04)
(3 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Finnoff, W., Hergert, F., Zimmermann, H. G.

Extended Regularization Methods for Nonconvergent Model Selection

Many techniques for model selection in the field of neural networks correspond to well established statistical methods. The method of'stopped training', on the other hand, in which an oversized network is trained until the error on a further validation set of examples deteriorates, then training is stopped, is a true innovation, since model selection doesn't require convergence of the training process. In this paper we show that this performance can be significantly enhanced by extending the'non convergent model selection method' of stopped training to include dynamic topology modifications (dynamic weight pruning) and modified complexity penalty term methods in which the weighting of the penalty term is adjusted during the training process. 1 INTRODUCTION One of the central topics in the field of neural networks is that of model selection. Both the theoretical and practical side of this have been intensively investigated and a vast array of methods have been suggested to perform this task. A widely used class of techniques starts by choosing an'oversized' network architecture then either removing redundant elements based on some measure of saliency (pruning), adding a further term to the cost function penalizing complexity (penalty terms), and finally, observing the error on a further validation set of examples, then stopping training as soon as this performance begins to deteriorate (stopped training).

penalty term, test variable, training process, (14 more...)

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Sa, Virginia R. de, Ballard, Dana H.

A Note on Learning Vector Quantization

Vector Quantization is useful for data compression. Competitive Learning which minimizes reconstruction error is an appropriate algorithm for vector quantization of unlabelled data. Vector quantization of labelled data for classification has a different objective, to minimize the number of misclassifications, and a different algorithm is appropriate. We show that a variant of Kohonen's LVQ2.1 algorithm can be seen as a multiclass extension of an algorithm which in a restricted 2 class case can be proven to converge to the Bayes optimal classification boundary. We compare the performance of the LVQ2.1 algorithm to that of a modified version having a decreasing window and normalized step size, on a ten class vowel classification problem.

algorithm, codebook vector, vector, (14 more...)

Country:

North America > United States > New York > Monroe County > Rochester (0.05)
North America > United States > Virginia (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Assessing and Improving Neural Network Predictions by the Bootstrap Algorithm

Paass, Gerhard

The bootstrap method offers an computation intensive alternative to estimate the predictive distribution for a neural network even if the analytic derivation is intractable. The available asymptotic results show that it is valid for a large number of linear, nonlinear and even nonparametric regression problems. It has the potential to model the distribution of estimators to a higher precision than the usual normal asymptotics. It even may be valid if the normal asymptotics fail. However, the theoretical properties of bootstrap procedures for neural networks - especially nonlinear models - have to be investigated more comprehensively.

bootstrap, neural network prediction, predictive distribution, (11 more...)

Country:

Europe > Germany (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)

Automatic Capacity Tuning of Very Large VC-Dimension Classifiers

Guyon, I., Boser, B., Vapnik, V.

He is now at the University of California, Berkeley.

algorithm, classifier, polynomial classifier, (14 more...)

Country:

North America > United States > California > Alameda County > Berkeley (0.34)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.85)

Chang, Eric I., Lippmann, Richard P.

A Boundary Hunting Radial Basis Function Classifier which Allocates Centers Constructively

A new boundary hunting radial basis function (BH-RBF) classifier which allocates RBF centers constructively near class boundaries is described. This classifier creates complex decision boundaries only in regions where confusions occur and corresponding RBF outputs are similar. A predicted square error measure is used to determine how many centers to add and to determine when to stop adding centers. Two experiments are presented which demonstrate the advantages of the BH RBF classifier. One uses artificial data with two classes and two input features where each class contains four clusters but only one cluster is near a decision region boundary.

boundary, classifier, rbf classifier, (10 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > New York (0.04)
Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Kohonen Feature Maps and Growing Cell Structures - a Performance Comparison

Fritzke, Bernd

A performance comparison of two self-organizing networks, the Kohonen Feature Map and the recently proposed Growing Cell Structures is made. For this purpose several performance criteria for self-organizing networks are proposed and motivated. The models are tested with three example problems of increasing difficulty. The Kohonen Feature Map demonstrates slightly superior results only for the simplest problem.

cell structure, neuron, reference vector, (15 more...)

Country:

Europe > Netherlands > North Holland > Amsterdam (0.05)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Singapore (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Learning Sequential Tasks by Incrementally Adding Higher Orders

Ring, Mark

An incremental, higher-order, non-recurrent network combines two properties found to be useful for learning sequential tasks: higherorder connections and incremental introduction of new units. The network adds higher orders when needed by adding new units that dynamically modify connection weights. Since the new units modify the weights at the next time-step with information from the previous step, temporal tasks can be learned without the use of feedback, thereby greatly simplifying training. Furthermore, a theoretically unlimited number of units can be added to reach into the arbitrarily distant past. Experiments with the Reber grammar have demonstrated speedups of two orders of magnitude over recurrent networks.

learning sequential task, new unit, recurrent network, (11 more...)

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > California > San Mateo County > San Mateo (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)