Goto

Collaborating Authors

 Europe


The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-Spaces

Neural Information Processing Systems

Andrew W. Moore School of Computer Science Carnegie-Mellon University Pittsburgh, PA 15213 Abstract Parti-game is a new algorithm for learning from delayed rewards in high dimensional real-valued state-spaces. In high dimensions it is essential that learning does not explore or plan over state space uniformly. Parti-game maintains a decision-tree partitioning of state-space and applies game-theory and computational geometry techniquesto efficiently and reactively concentrate high resolution onlyon critical areas. Many simulated problems have been tested, ranging from 2-dimensional to 9-dimensional state-spaces, including mazes, path planning, nonlinear dynamics, and uncurling snakerobots in restricted spaces. In all cases, a good solution is found in less than twenty trials and a few minutes. 1 REINFORCEMENT LEARNING Reinforcement learning [Samuel, 1959, Sutton, 1984, Watkins, 1989, Barto et al., 1991] is a promising method for control systems to program and improve themselves.


Development of Orientation and Ocular Dominance Columns in Infant Macaques

Neural Information Processing Systems

Maps of orientation preference and ocular dominance were recorded optically from the cortices of 5 infant macaque monkeys, ranging in age from 3.5 to 14 weeks. In agreement with previous observations, we found that basic features of orientation and ocular dominance maps, as well as correlations between them, are present and robust by 3.5 weeks of age. We did observe changes in the strength of ocular dominance signals, as well as in the spacing of ocular dominance bands,both of which increased steadily between 3.5 and 14 weeks of age. The latter finding suggests that the adult spacing of ocular dominance bands depends on cortical growth in neonatal animals. Since we found no corresponding increase in the spacing of orientation preferences, however, there is a possibility that the orientation preferences of some cells change as the cortical surface expands. Since correlations between the patterns of orientation selectivity and ocular dominance are present at an age, when the visual system is still immature, it seems more likely that their development maybe an innate process and may not require extensive visual experience.


Optimal Stopping and Effective Machine Complexity in Learning

Neural Information Processing Systems

We study tltt' problem of when to stop If'arning a class of feedforward networks - networks with linear outputs I1PUrOIl and fixed input weights - when they are trained with a gradient descent algorithm on a finite number of examples. Under general regularity conditions, it is shown that there a.re in general three distinct phases in the generalization performance in the learning process, and in particular, the network has hetter gt'neralization pPTformance when learning is stopped at a certain time before til(' global miniIl111lu of the empirical error is reachert. A notion of effective size of a machine is rtefil1e i and used to explain the tradeoff betwf'en the complexity of the marhine and the training error ill the learning process. The study leads nat.urally to a network size selection critt'rion, which turns Ol1t to be a generalization of Akaike's Information Criterioll for the It'arning process. It if; shown that stopping Iparning before tiJt' global minimum of the empirical error has the effect of network size splectioll. 1 INTRODUCTION The primary goal of learning in neural nets is to find a network that gives valid generalization. In achieving this goal, a central issue is the tradeoff between the training error and network complexity. This usually reduces to a problem of network size selection, which has drawn much research effort in recent years. Various principles, theories, and intuitions, including Occam's razor, statistical model selection criteria such as Akaike's Information Criterion (AIC) [11 and many others [5, 1, 10,3,111 all quantitatively support the following PAC prescription: between two machines which have the same empirical error, the machine with smaller VC-dimf'nsion generalizes better. However, it is noted that these methods or criteria do not npcpssarily If'ad to optimal (or llearly optimal) generalization performance.


Fast Non-Linear Dimension Reduction

Neural Information Processing Systems

We propose a new distance measure which is optimal for the task of local PCA. Our results with speech and image data indicate that the nonlinear techniques provide more accurate encodings than PCA. Our local linear algorithm produces more accurate encodings (except for one simulation with image data), and trains much faster than five layer auto-associative networks. Acknowledgments This work was supported by grants from the Air Force Office of Scientific Research (F49620-93-1-0253) and Electric Power Research Institute (RP8015-2). The authors are grateful to Gary Cottrell and David DeMers for providing their image database and clarifying their experimental results. We also thank our colleagues in the Center for Spoken Language Understanding at OGI for providing speech data.


Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms

Neural Information Processing Systems

Reinforcement Learning methods based on approximating dynamic programming (DP) are receiving increased attention due to their utility in forming reactive control policies for systems embedded in dynamic environments. Environments are usually modeled as controlled Markov processes, but when the environment model is not known a priori, adaptive methods are necessary. Adaptive control methodsare often classified as being direct or indirect. Direct methods directly adapt the control policy from experience, whereas indirect methods adapt a model of the controlled process and compute controlpolicies based on the latest model. Our focus is on indirect adaptive DPbased methods in this paper. We present a convergence result for indirect adaptive asynchronous value iteration algorithmsfor the case in which a lookup table is used to store the value function. Our result implies convergence of several existing reinforcementlearning algorithms such as adaptive real-time dynamic programming (ARTDP) (Barto, Bradtke, & Singh, 1993) and prioritized sweeping (Moore & Atkeson, 1993). Although the emphasis of researchers studying DPbased reinforcement learning has been on direct adaptive methods such as Q-Learning (Watkins, 1989) and methods using TD algorithms (Sutton, 1988), it is not clear that these direct methods are preferable in practice to indirect methods such as those analyzed in this paper.


High Performance Neural Net Simulation on a Multiprocessor System with "Intelligent" Communication

Neural Information Processing Systems

Urs A. Miiller, Michael Kocheisen, and Anton Gunzinger Electronics Laboratory, Swiss Federal Institute of Technology CH-B092 Zurich, Switzerland Abstract The performance requirements in experimental research on artificial neuralnets often exceed the capability of workstations and PCs by a great amount. But speed is not the only requirement. Flexibility and implementation time for new algorithms are usually of equal importance. This paper describes the simulation of neural nets on the MUSIC parallel supercomputer, a system that shows a good balance between the three issues and therefore made many research projects possible that were unthinkable before. The system should be flexible, simple to program and the realization time should be short enough to not have an obsolete system by the time it is finished. Therefore, the fastest available standard components were used.


Illumination-Invariant Face Recognition with a Contrast Sensitive Silicon Retina

Neural Information Processing Systems

We report face recognition results under drastically changing lighting conditions for a computer vision system whichconcurrently uses a contrast sensitive silicon retina and a conventional, gaincontrolled CCO camera. For both input devices the face recognition system employs an elastic matching algorithm with wavelet based features to classify unknown faces. To assess the effect of analog on-chip preprocessing by the silicon retina the CCO images have been "digitally preprocessed" with a bandpass filter to adjust the power spectrum. Thesilicon retina with its ability to adjust sensitivity increases the recognition rate up to 50 percent. These comparative experiments demonstrate that preprocessing with an analog VLSI silicon retina generates imagedata enriched with object-constant features.


A Comparative Study of a Modified Bumptree Neural Network with Radial Basis Function Networks and the Standard Multi Layer Perceptron

Neural Information Processing Systems

Bumptrees are geometric data structures introduced by Omohundro (1991) to provide efficient access to a collection of functions on a Euclidean space of interest. We describe a modified bumptree structure that has been employed as a neural network classifier, and compare its performance on several classification tasks against that of radial basis function networks and the standard mutIi-Iayer perceptron. 1 INTRODUCTION A number of neural network studies have demonstrated the utility of the multi-layer perceptron (MLP) and shown it to be a highly effective paradigm. Studies have also shown, however, that the MLP is not without its problems, in particular it requires an extensive training time, is susceptible to local minima problems and its perfonnance is dependent upon its internal network architecture. In an attempt to improve upon the generalisation performance and computational efficiency a number of studies have been undertaken principally concerned with investigating the parametrisation of the MLP. It is well known, for example, that the generalisation performance of the MLP is affected by the number of hidden units in the network, which have to be determined empirically since theory provides no guidance.


Encoding Labeled Graphs by Labeling RAAM

Neural Information Processing Systems

Alessandro Sperduti* Department of Computer Science Pisa University Corso Italia 40, 56125 Pisa, Italy Abstract In this paper we propose an extension to the RAAM by Pollack. This extension, the Labeling RAAM (LRAAM), can encode labeled graphswith cycles by representing pointers explicitly. Data encoded in an LRAAM can be accessed by pointer as well as by content. Direct access by content can be achieved by transforming theencoder network of the LRAAM into an analog Hopfield network with hidden units. Different access procedures can be defined depending on the access key. Sufficient conditions on the asymptotical stability of the associated Hopfield network are briefly introduced. 1 INTRODUCTION In the last few years, several researchers have tried to demonstrate how symbolic structures such as lists, trees, and stacks can be represented and manipulated in a connectionist system, while still preserving all the computational characteristics of connectionism (and extending them to the symbolic representations) (Hinton, 1990; Plate, 1991; Pollack, 1990; Smolensky, 1990; Touretzky, 1990).