Plotting

 Technology


Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

Journal of Artificial Intelligence Research

Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal credit assignment in reinforcement learning. Well known reinforcement learning algorithms, such as AHC or Q-learning, may be viewed as instances of TD learning. This paper examines the issues of the efficient and general implementation of TD(lambda) for arbitrary lambda, for use with reinforcement learning algorithms optimizing the discounted sum of rewards. The traditional approach, based on eligibility traces, is argued to suffer from both inefficiency and lack of generality. The TTD (Truncated Temporal Differences) procedure is proposed as an alternative, that indeed only approximates TD(lambda), but requires very little computation per action and can be used with arbitrary function representation methods. The idea from which it is derived is fairly simple and not new, but probably unexplored so far. Encouraging experimental results are presented, suggesting that using lambda > 0 with the TTD procedure allows one to obtain a significant learning speedup at essentially the same cost as usual TD(0) learning.


A Domain-Independent Algorithm for Plan Adaptation

Journal of Artificial Intelligence Research

The paradigms of transformational planning, case-based planning, and plan debugging all involve a process known as plan adaptation - modifying or repairing an old plan so it solves a new problem. In this paper we provide a domain-independent algorithm for plan adaptation, demonstrate that it is sound, complete, and systematic, and compare it to other adaptation algorithms in the literature. Our approach is based on a view of planning as searching a graph of partial plans. Generative planning starts at the graph's root and moves from node to node using plan-refinement operators. In planning by adaptation, a library plan - an arbitrary node in the plan graph - is the starting point for the search, and the plan-adaptation algorithm can apply both the same refinement operators available to a generative planner and can also retract constraints and steps from the plan. Our algorithm's completeness ensures that the adaptation algorithm will eventually search the entire graph and its systematicity ensures that it will do so without redundantly searching any parts of the graph.


Solving Multiclass Learning Problems via Error-Correcting Output Codes

Journal of Artificial Intelligence Research

Multiclass learning problems involve finding a definitionfor an unknown function f(x) whose range is a discrete setcontaining k > 2 values (i.e., k ``classes''). Thedefinition is acquired by studying collections of training examples ofthe form [x_i, f (x_i)]. Existing approaches tomulticlass learning problems include direct application of multiclassalgorithms such as the decision-tree algorithms C4.5 and CART,application of binary concept learning algorithms to learn individualbinary functions for each of the k classes, and application ofbinary concept learning algorithms with distributed outputrepresentations. This paper compares these three approaches to a newtechnique in which error-correcting codes are employed as adistributed output representation. We show that these outputrepresentations improve the generalization performance of both C4.5and backpropagation on a wide range of multiclass learning tasks. Wealso demonstrate that this approach is robust with respect to changesin the size of the training sample, the assignment of distributedrepresentations to particular classes, and the application ofoverfitting avoidance techniques such as decision-tree pruning.Finally, we show that---like the other methods---the error-correctingcode technique can provide reliable class probability estimates.Taken together, these results demonstrate that error-correcting outputcodes provide a general-purpose method for improving the performanceof inductive learning programs on multiclass problems.


Probabilistic Anomaly Detection in Dynamic Systems

Neural Information Processing Systems

Padhraic Smyth Jet Propulsion Laboratory 238-420 California Institute of Technology 4800 Oak Grove Drive Pasadena, CA 91109 Abstract This paper describes probabilistic methods for novelty detection when using pattern recognition methods for fault monitoring of dynamic systems. The problem of novelty detection is particularly acutewhen prior knowledge and training data only allow one to construct an incomplete classification model. Allowance must be made in model design so that the classifier will be robust to data generated by classes not included in the training phase. For diagnosis applications one practical approach is to construct both an input density model and a discriminative class model. Using Bayes' rule and prior estimates of the relative likelihood of data of known and unknown origin the resulting classification equations are straightforward.


WATTLE: A Trainable Gain Analogue VLSI Neural Network

Neural Information Processing Systems

This paper describes a low power analogue VLSI neural network called Wattle. Wattle is a 10:6:4 three layer perceptron with multiplying DACsynapses and on chip switched capacitor neurons fabricated in 1.2um CMOS.


Temporal Difference Learning of Position Evaluation in the Game of Go

Neural Information Processing Systems

Computational Neurobiology Laboratory The Salk Institute for Biological Studies San Diego, CA 92186-5800 Abstract The game of Go has a high branching factor that defeats the tree search approach used in computer chess, and long-range spatiotemporal interactionsthat make position evaluation extremely difficult. Development of conventional Go programs is hampered by their knowledge-intensive nature. We demonstrate a viable alternative by training networks to evaluate Go positions via temporal difference(TD) learning. Our approach is based on network architectures that reflect the spatial organization of both input and reinforcement signals on the Go board, and training protocols that provide exposure to competent (though unlabelled) play. These techniques yield far better performance than undifferentiated networks trained by selfplay alone.A network with less than 500 weights learned within 3,000 games of 9x9 Go a position evaluation function that enables a primitive one-ply search to defeat a commercial Go program at a low playing level. 1 INTRODUCTION Go was developed three to four millenia ago in China; it is the oldest and one of the most popular board games in the world.


The "Softmax" Nonlinearity: Derivation Using Statistical Mechanics and Useful Properties as a Multiterminal Analog Circuit Element

Neural Information Processing Systems

Reciprocal circuit elements facilitate such an implementation since they 882 The "Softmax" Nonlinearity 883 can be combined with other reciprocal elements to form an analog network having Lyapunov-like functions: the network content or co-content. In this paper, we show a reciprocal implementation of the "softmax" nonlinearity that is usually used to enforce local competition between neurons [Peterson, 1989]. We show that the circuit ispassive and incrementally passive, and we explicitly compute its content and co-content functions. This circuit adds a new element to the library of the analog circuit designer that can be combined with reciprocal constraint boxes [Harris, 1988] and nonlinear resistive fuses [Harris, 1989] to form fast, analog VLSI optimization networks.


VLSI Phase Locking Architectures for Feature Linking in Multiple Target Tracking Systems

Neural Information Processing Systems

Department of Electrical Engineering The University of Maryland College Park, MD 20722 Abstract Recent physiological research has shown that synchronization of oscillatory responses in striate cortex may code for relationships between visual features of objects. A VLSI circuit has been designed toprovide rapid phase-locking synchronization of multiple oscillators to allow for further exploration of this neural mechanism. By exploiting the intrinsic random transistor mismatch of devices operated in subthreshold, large groups of phase-locked oscillators can be readily partitioned into smaller phase-locked groups. A mUltiple target tracker for binary images is described utilizing this phase-locking architecture. A VLSI chip has been fabricated and tested to verify the architecture.


Decoding Cursive Scripts

Neural Information Processing Systems

Online cursive handwriting recognition is currently one of the most intriguing challenges in pattern recognition. This study presents a novel approach to this problem which is composed of two complementary phases.The first is dynamic encoding of the writing trajectory into a compact sequence of discrete motor control symbols. In this compact representation we largely remove the redundancy of the script, while preserving most of its intelligible components. In the second phase these control sequences are used to train adaptive probabilistic acyclic automata (PAA) for the important ingredients of the writing trajectories, e.g.


Figure of Merit Training for Detection and Spotting

Neural Information Processing Systems

Spotting tasks require detection of target patterns from a background of richly varied non-target inputs. The performance measure of interest for these tasks, called the figure of merit (FOM), is the detection rate for target patterns when the false alarm rate is in an acceptable range. A new approach to training spotters is presented which computes the FOM gradient for each input pattern and then directly maximizes the FOM using b ackpropagati on. This eliminates the need for thresholds during training. It also uses network resources to model Bayesian a posteriori probability functions accurately only for patterns which have a significant effect on the detection accuracy over the false alarm rate of interest. FOM training increased detection accuracy by 5 percentage points for a hybrid radial basis function (RBF) - hidden Markov model (HMM) wordspotter on the credit-card speech corpus.