Goto

Collaborating Authors

 Technology


Development of Orientation and Ocular Dominance Columns in Infant Macaques

Neural Information Processing Systems

Maps of orientation preference and ocular dominance were recorded optically from the cortices of 5 infant macaque monkeys, ranging in age from 3.5 to 14 weeks. In agreement with previous observations, we found that basic features of orientation and ocular dominance maps, as well as correlations between them, are present and robust by 3.5 weeks of age. We did observe changes in the strength of ocular dominance signals, as well as in the spacing of ocular dominance bands,both of which increased steadily between 3.5 and 14 weeks of age. The latter finding suggests that the adult spacing of ocular dominance bands depends on cortical growth in neonatal animals. Since we found no corresponding increase in the spacing of orientation preferences, however, there is a possibility that the orientation preferences of some cells change as the cortical surface expands. Since correlations between the patterns of orientation selectivity and ocular dominance are present at an age, when the visual system is still immature, it seems more likely that their development maybe an innate process and may not require extensive visual experience.


Optimal Stopping and Effective Machine Complexity in Learning

Neural Information Processing Systems

We study tltt' problem of when to stop If'arning a class of feedforward networks - networks with linear outputs I1PUrOIl and fixed input weights - when they are trained with a gradient descent algorithm on a finite number of examples. Under general regularity conditions, it is shown that there a.re in general three distinct phases in the generalization performance in the learning process, and in particular, the network has hetter gt'neralization pPTformance when learning is stopped at a certain time before til(' global miniIl111lu of the empirical error is reachert. A notion of effective size of a machine is rtefil1e i and used to explain the tradeoff betwf'en the complexity of the marhine and the training error ill the learning process. The study leads nat.urally to a network size selection critt'rion, which turns Ol1t to be a generalization of Akaike's Information Criterioll for the It'arning process. It if; shown that stopping Iparning before tiJt' global minimum of the empirical error has the effect of network size splectioll. 1 INTRODUCTION The primary goal of learning in neural nets is to find a network that gives valid generalization. In achieving this goal, a central issue is the tradeoff between the training error and network complexity. This usually reduces to a problem of network size selection, which has drawn much research effort in recent years. Various principles, theories, and intuitions, including Occam's razor, statistical model selection criteria such as Akaike's Information Criterion (AIC) [11 and many others [5, 1, 10,3,111 all quantitatively support the following PAC prescription: between two machines which have the same empirical error, the machine with smaller VC-dimf'nsion generalizes better. However, it is noted that these methods or criteria do not npcpssarily If'ad to optimal (or llearly optimal) generalization performance.


Fast Non-Linear Dimension Reduction

Neural Information Processing Systems

We propose a new distance measure which is optimal for the task of local PCA. Our results with speech and image data indicate that the nonlinear techniques provide more accurate encodings than PCA. Our local linear algorithm produces more accurate encodings (except for one simulation with image data), and trains much faster than five layer auto-associative networks. Acknowledgments This work was supported by grants from the Air Force Office of Scientific Research (F49620-93-1-0253) and Electric Power Research Institute (RP8015-2). The authors are grateful to Gary Cottrell and David DeMers for providing their image database and clarifying their experimental results. We also thank our colleagues in the Center for Spoken Language Understanding at OGI for providing speech data.


Learning Classification with Unlabeled Data

Neural Information Processing Systems

Department of Computer Science University of Rochester Rochester, NY 14627 Abstract One of the advantages of supervised learning is that the final error metric isavailable during training. For classifiers, the algorithm can directly reduce the number of misclassifications on the training set. Unfortunately, whenmodeling human learning or constructing classifiers for autonomous robots,supervisory labels are often not available or too expensive. In this paper we show that we can substitute for the labels by making use of structure between the pattern distributions to different sensory modalities.We show that minimizing the disagreement between the outputs of networks processing patterns from these different modalities is a sensible approximation to minimizing the number of misclassifications in each modality, and leads to similar results. Using the Peterson-Barney vowel dataset we show that the algorithm performs well in finding appropriate placementfor the codebook vectors particularly when the confuseable classes are different for the two modalities. 1 INTRODUCTION This paper addresses the question of how a human or autonomous robot can learn to classify new objects without experience with previous labeled examples.


A Local Algorithm to Learn Trajectories with Stochastic Neural Networks

Neural Information Processing Systems

This paper presents a simple algorithm to learn trajectories with a continuous time, continuous activation version of the Boltzmann machine. The algorithm takes advantage of intrinsic Brownian noise in the network to easily compute gradients using entirely local computations. The algorithm may be ideal for parallel hardware implementations. This paper presents a learning algorithm to train continuous stochastic networks to respond with desired trajectories in the output units to environmental input trajectories. This is a task, with potential applications to a variety of problems such as stochastic modeling of neural processes, artificial motor control, and continuous speech recognition.


Optimal Brain Surgeon: Extensions and performance comparisons

Neural Information Processing Systems

We extend Optimal Brain Surgeon (OBS) - a second-order method for pruning networks - to allow for general error measures, and explore a reduced computational and storage implementation via a dominant eigenspace decomposition. Simulations on nonlinear, noisy pattern classification problems reveal that OBS does lead to improved generalization, and performs favorably in comparison with Optimal Brain Damage (OBD). We find that the required retraining steps in OBD may lead to inferior generalization, that can be interpreted as due to injecting noise backa result the system. A common technique is to stop training of a largeinto at the minimum validation error. We found that the testnetwork error could be reduced even further by means of OBS (but not OBD) pruning.


Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms

Neural Information Processing Systems

Reinforcement Learning methods based on approximating dynamic programming (DP) are receiving increased attention due to their utility in forming reactive control policies for systems embedded in dynamic environments. Environments are usually modeled as controlled Markov processes, but when the environment model is not known a priori, adaptive methods are necessary. Adaptive control methodsare often classified as being direct or indirect. Direct methods directly adapt the control policy from experience, whereas indirect methods adapt a model of the controlled process and compute controlpolicies based on the latest model. Our focus is on indirect adaptive DPbased methods in this paper. We present a convergence result for indirect adaptive asynchronous value iteration algorithmsfor the case in which a lookup table is used to store the value function. Our result implies convergence of several existing reinforcementlearning algorithms such as adaptive real-time dynamic programming (ARTDP) (Barto, Bradtke, & Singh, 1993) and prioritized sweeping (Moore & Atkeson, 1993). Although the emphasis of researchers studying DPbased reinforcement learning has been on direct adaptive methods such as Q-Learning (Watkins, 1989) and methods using TD algorithms (Sutton, 1988), it is not clear that these direct methods are preferable in practice to indirect methods such as those analyzed in this paper.


High Performance Neural Net Simulation on a Multiprocessor System with "Intelligent" Communication

Neural Information Processing Systems

Urs A. Miiller, Michael Kocheisen, and Anton Gunzinger Electronics Laboratory, Swiss Federal Institute of Technology CH-B092 Zurich, Switzerland Abstract The performance requirements in experimental research on artificial neuralnets often exceed the capability of workstations and PCs by a great amount. But speed is not the only requirement. Flexibility and implementation time for new algorithms are usually of equal importance. This paper describes the simulation of neural nets on the MUSIC parallel supercomputer, a system that shows a good balance between the three issues and therefore made many research projects possible that were unthinkable before. The system should be flexible, simple to program and the realization time should be short enough to not have an obsolete system by the time it is finished. Therefore, the fastest available standard components were used.


Estimating analogical similarity by dot-products of Holographic Reduced Representations

Neural Information Processing Systems

Gentner and Markman (1992) suggested that the ability to deal with analogy will be a "Watershed or Waterloo" for connectionist models. They identified "structural alignment" as the central aspect of analogy making. They noted the apparent ease with which people can perform structural alignment in a wide variety of tasks and were pessimistic about the of a distributed connectionist model that could be useful inprospects for the development performing structural alignment. In this paper I describe how Holographic Reduced Representations (HRRs) (Plate, 1991; Plate, 1994), a fixed-width distributed representation for nested structures, can be used to obtain fast estimates of analogical similarity.


Monte Carlo Matrix Inversion and Reinforcement Learning

Neural Information Processing Systems

We describe the relationship between certain reinforcement learning (RL) methods based on dynamic programming (DP) and a class of unorthodox Monte Carlo methods for solving systems of linear equations proposed in the 1950's. These methods recast the solution of the linear system as the expected value of a statistic suitably defined over sample paths of a Markov chain. The significance of our observations lies in arguments (Curtiss, 1954) that these Monte Carlo methods scale better with respect to state-space size than do standard, iterative techniques for solving systems of linear equations. This analysis also establishes convergence rate estimates. Because methods used in RL systems for approximating the evaluation function of a fixed control policy also approximate solutions to systems of linear equations, the connection to these Monte Carlo methods establishes that algorithms very similar to TD algorithms (Sutton, 1988) are asymptotically more efficient in a precise sense than other methods for evaluating policies. Further, all DPbased RL methods have some of the properties of these Monte Carlo algorithms, that although RL is often perceived towhich suggests be slow, for sufficiently large problems, it may in fact be more efficient than other known classes of methods capable of producing the same results.