Goto

Collaborating Authors

 North America


Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems

Neural Information Processing Systems

Increasing attention has been paid to reinforcement learning algorithms inrecent years, partly due to successes in the theoretical analysis of their behavior in Markov environments. If the Markov assumption is removed, however, neither generally the algorithms nor the analyses continue to be usable. We propose and analyze a new learning algorithm to solve a certain class of non-Markov decision problems. Our algorithm applies to problems in which the environment is Markov, but the learner has restricted access to state information. The algorithm involves a Monte-Carlo policy evaluationcombined with a policy improvement method that is similar to that of Markov decision problems and is guaranteed to converge to a local maximum. The algorithm operates in the space of stochastic policies, a space which can yield a policy that performs considerablybetter than any deterministic policy. Although the space of stochastic policies is continuous-even for a discrete action space-our algorithm is computationally tractable.


The Use of Dynamic Writing Information in a Connectionist On-Line Cursive Handwriting Recognition System

Neural Information Processing Systems

This system combines a robust input representation, which preserves the dynamic writing information, with a neural network architecture, a so called Multi-State Time Delay Neural Network (MS-TDNN), which integrates rec.ognition and segmentation ina single framework. Our preprocessing transforms the original coordinate sequence into a (still temporal) sequence offeature vectors,which combine strictly local features, like curvature or writing direction, with a bitmap-like representation of the coordinate's proximity.The MS-TDNN architecture is well suited for handling temporal sequences as provided by this input representation. Oursystem is tested both on writer dependent and writer independent tasks with vocabulary sizes ranging from 400 up to 20,000 words. For example, on a 20,000 word vocabulary we achieve word recognition rates up to 88.9% (writer dependent) and 84.1 % (writer independent) without using any language models.


An Alternative Model for Mixtures of Experts

Neural Information Processing Systems

Hinton Dept. of Computer Science University of Toronto Toronto, M5S lA4, Canada Abstract We propose an alternative model for mixtures of experts which uses a different parametric form for the gating network. The modified model is trained by the EM algorithm. In comparison with earlier models-trained by either EM or gradient ascent-there is no need to select a learning stepsize. We report simulation experiments which show that the new architecture yields faster convergence. We also apply the new model to two problem domains: piecewise nonlinear function approximation and the combination of multiple previously trained classifiers. 1 INTRODUCTION For the mixtures of experts architecture (Jacobs, Jordan, Nowlan & Hinton, 1991), the EM algorithm decouples the learning process in a manner that fits well with the modular structure and yields a considerably improved rate of convergence (Jordan & Jacobs, 1994).


Reinforcement Learning Methods for Continuous-Time Markov Decision Problems

Neural Information Processing Systems

Semi-Markov Decision Problems are continuous time generalizations ofdiscrete time Markov Decision Problems. A number of reinforcement learning algorithms have been developed recently for the solution of Markov Decision Problems, based on the ideas of asynchronous dynamic programming and stochastic approximation. Amongthese are TD(,x), Q-Iearning, and Real-time Dynamic Programming. After reviewing semi-Markov Decision Problems and Bellman's optimality equation in that context, we propose algorithms similarto those named above, adapted to the solution of semi-Markov Decision Problems. We demonstrate these algorithms by applying them to the problem of determining the optimal control fora simple queueing system. We conclude with a discussion of circumstances under which these algorithms may be usefully applied. 1 Introduction A number of reinforcement learning algorithms based on the ideas of asynchronous dynamic programming and stochastic approximation have been developed recently for the solution of Markov Decision Problems.


Associative Decorrelation Dynamics: A Theory of Self-Organization and Optimization in Feedback Networks

Neural Information Processing Systems

This paper outlines a dynamic theory of development and adaptation inneural networks with feedback connections. Given input ensemble, the connections change in strength according to an associative learning rule and approach a stable state where the neuronal outputs are decorrelated. We apply this theory to primary visualcortex and examine the implications of the dynamical decorrelation of the activities of orientation selective cells by the intracortical connections. The theory gives a unified and quantitative explanationof the psychophysical experiments on orientation contrast and orientation adaptation. Using only one parameter, we achieve good agreements between the theoretical predictions and the experimental data. 1 Introduction The mammalian visual system is very effective in detecting the orientations of lines and most neurons in primary visual cortex selectively respond to oriented lines and form orientation columns [1) . Why is the visual system organized as such? We *Present address: Rockefeller University, B272, 1230 York Avenue, NY, NY 10021-6399.


Catastrophic Interference in Human Motor Learning

Neural Information Processing Systems

Biological sensorimotor systems are not static maps that transform input (sensory information) into output (motor behavior). Evidence frommany lines of research suggests that their representations are plastic, experience-dependent entities. While this plasticity is essential for flexible behavior, it presents the nervous system with difficult organizational challenges. If the sensorimotor system adapts itself to perform well under one set of circumstances, will it then perform poorly when placed in an environment with different demands (negative transfer)? Will a later experience-dependent change undo the benefits of previous learning (catastrophic interference)?



Factorial Learning by Clustering Features

Neural Information Processing Systems

We introduce a novel algorithm for factorial learning, motivated by segmentation problems in computational vision, in which the underlying factors correspond to clusters of highly correlated input features. The algorithm derives from a new kind of competitive clustering model, in which the cluster generators compete to explain eachfeature of the data set and cooperate to explain each input example, rather than competing for examples and cooperating onfeatures, as in traditional clustering algorithms. A natural extension of the algorithm recovers hierarchical models of data generated from multiple unknown categories, each with a different, multiplecausal structure. Several simulations demonstrate the power of this approach.


Glove-TalkII: Mapping Hand Gestures to Speech Using Neural Networks

Neural Information Processing Systems

There are many different possible schemes for converting hand gestures to speech. The choice of scheme depends on the granularity of the speech that you want to produce. Figure 1 identifies a spectrum defined by possible divisions of speech based on the duration of the sound for each granularity. What is interesting is that in general, the coarser the division of speech, the smaller the bandwidth necessary for the user. In contrast, where the granularity of speech is on the order of articulatory musclemovements (i.e. the artificial vocal tract [AVT]) high bandwidth control is necessary for good speech. Devices which implement this model of speech production are like musical instruments which produce speech sounds.


JPMAX: Learning to Recognize Moving Objects as a Model-fitting Problem

Neural Information Processing Systems

Suzanna Becker Department of Psychology, McMaster University Hamilton, Onto L8S 4K1 Abstract Unsupervised learning procedures have been successful at low-level feature extraction and preprocessing of raw sensor data. So far, however, they have had limited success in learning higher-order representations, e.g., of objects in visual images. A promising approach isto maximize some measure of agreement between the outputs of two groups of units which receive inputs physically separated inspace, time or modality, as in (Becker and Hinton, 1992; Becker, 1993; de Sa, 1993). Using the same approach, a much simpler learningprocedure is proposed here which discovers features in a single-layer network consisting of several populations of units, and can be applied to multi-layer networks trained one layer at a time. When trained with this algorithm on image sequences of moving geometric objects a two-layer network can learn to perform accurate position-invariant object classification. 1 LEARNING COHERENT CLASSIFICATIONS A powerful constraint in sensory data is coherence over time, in space, and across different sensory modalities.