Goto

Collaborating Authors

 Asia


Convergence of Stochastic Iterative Dynamic Programming Algorithms

Neural Information Processing Systems

Increasing attention has recently been paid to algorithms based on dynamic programming (DP) due to the suitability of DP for learning problems involving control. In stochastic environments where the system being controlled is only incompletely known, however, a unifying theoretical account of these methods has been missing. In this paper we relate DPbased learning algorithms to the powerful techniques of stochastic approximation via a new convergence theorem, enabling us to establish a class of convergent algorithms to which both TD("\) and Q-Iearning belong. 1 INTRODUCTION Learning to predict the future and to find an optimal way of controlling it are the basic goals of learning systems that interact with their environment. A variety of algorithms are currently being studied for the purposes of prediction and control in incompletely specified, stochastic environments. Here we consider learning algorithms defined in Markov environments. There are actions or controls (u) available for the learner that affect both the state transition probabilities, and the probability distribution for the immediate, state dependent costs (Ci(u)) incurred by the learner.



Exploiting Chaos to Control the Future

Neural Information Processing Systems

Recently, Ott, Grebogi and Yorke (OGY) [6] found an effective method to control chaotic systems to unstable fixed points by using only small control forces; however, OGY's method is based on and limited to a linear theory and requires considerable knowledge of the dynamics of the system to be controlled. In this paper we use two radial basis function networks: one as a model of an unknown plant and the other as the controller. The controller is trained with a recurrent learning algorithm to minimize a novel objective function such that the controller can locate an unstable fixed point and drive the system into the fixed point with no a priori knowledge of the system dynamics. Our results indicate that the neural controller offers many advantages over OGY's technique.


Transition Point Dynamic Programming

Neural Information Processing Systems

Transition point dynamic programming (TPDP) is a memorybased, reinforcement learning, direct dynamic programming approach to adaptive optimal control that can reduce the learning time and memory usage required for the control of continuous stochastic dynamic systems. TPDP does so by determining an ideal set of transition points (TPs) which specify only the control action changes necessary for optimal control. TPDP converges to an ideal TP set by using a variation of Q-Iearning to assess the merits of adding, swapping and removing TPs from states throughout the state space. When applied to a race track problem, TPDP learned the optimal control policy much sooner than conventional Q-Iearning, and was able to do so using less memory. 1 INTRODUCTION Dynamic programming (DP) approaches can be utilized to determine optimal control policies for continuous stochastic dynamic systems when the state spaces of those systems have been quantized with a resolution suitable for control (Barto et al., 1991). DP controllers, in lheir simplest form, are memory-based controllers that operate by repeatedly updating cost values associated with every state in the discretized state space (Barto et al., 1991).


Optimal Signalling in Attractor Neural Networks

Neural Information Processing Systems

It is well known that a given cortical neuron can respond with a different firing pattern for the same synaptic input, depending on its firing history and on the effects of modulator transmitters (see [Connors and Gutnick, 1990] for a review). The time span of different channel conductances is very broad, and the influence of some ionic currents varies with the history of the membrane potential [Lytton, 1991]. Motivated by the history-dependent nature of neuronal firing, we continue.our


Correlation Functions in a Large Stochastic Neural Network

Neural Information Processing Systems

In many cases the crosscorrelations between the activities of cortical neurons are approximately symmetric about zero time delay. These have been taken as an indication of the presence of "functional connectivity" between the correlated neurons (Fetz, Toyama and Smith 1991, Abeles 1991). However, a quantitative comparison between the observed cross-correlations and those expected to exist between neurons that are part of a large assembly of interacting population has been lacking. Most of the theoretical studies of recurrent neural network models consider only time averaged firing rates, which are usually given as solutions of mean-field equations. They do not account for the fluctuations about these averages, the study of which requires going beyond the mean-field approximations. In this work we perform a theoretical study of the fluctuations in the neuronal activities and their correlations, in a large stochastic network of excitatory and inhibitory neurons. Depending on the model parameters, this system can exhibit coherent undamped oscillations. Here we focus on parameter regimes where the system is in a statistically stationary state, which is more appropriate for modeling non oscillatory neuronal activity in cortex. Our results for the magnitudes and the time-dependence of the correlation functions can provide a basis for comparison with physiological data on neuronal correlation functions.


Observability of Neural Network Behavior

Neural Information Processing Systems

We prove that except possibly for small exceptional sets, discretetime analog neural nets are globally observable, i.e. all their corrupted pseudo-orbits on computer simulations actually reflect the true dynamical behavior of the network. Locally finite discrete (boolean) neural networks are observable without exception.


Coupled Dynamics of Fast Neurons and Slow Interactions

Neural Information Processing Systems

A simple model of coupled dynamics of fast neurons and slow interactions, modelling self-organization in recurrent neural networks, leads naturally to an effective statistical mechanics characterized by a partition function which is an average over a replicated system. This is reminiscent of the replica trick used to study spin-glasses, but with the difference that the number of replicas has a physical meaning as the ratio of two temperatures and can be varied throughout the whole range of real values. The model has interesting phase consequences as a function of varying this ratio and external stimuli, and can be extended to a range of other models. As the basic archetypal model we consider a system of Ising spin neurons (J'i E {-I, I}, i E {I,..., N}, interacting via continuous-valued symmetric interactions, Iij, which themselves evolve in response to the states of the neurons. JijO"iO"j (2) i j and the subscript {Jij} indicates that the {Jij} are to be considered as quenched variables.


Solvable Models of Artificial Neural Networks

Neural Information Processing Systems

Solvable models of nonlinear learning machines are proposed, and learning in artificial neural networks is studied based on the theory of ordinary differential equations. A learning algorithm is constructed, by which the optimal parameter can be found without any recursive procedure. The solvable models enable us to analyze the reason why experimental results by the error backpropagation often contradict the statistical learning theory.


Discontinuous Generalization in Large Committee Machines

Neural Information Processing Systems

The problem of learning from examples in multilayer networks is studied within the framework of statistical mechanics. Using the replica formalism we calculate the average generalization error of a fully connected committee machine in the limit of a large number of hidden units. If the number of training examples is proportional to the number of inputs in the network, the generalization error as a function of the training set size approaches a finite value. If the number of training examples is proportional to the number of weights in the network we find first-order phase transitions with a discontinuous drop in the generalization error for both binary and continuous weights. 1 INTRODUCTION Feedforward neural networks are widely used as nonlinear, parametric models for the solution of classification tasks and function approximation. Trained from examples of a given task, they are able to generalize, i.e. to compute the correct output for new, unknown inputs.