Goto

Collaborating Authors

 Asia



Neuron-MOS Temporal Winner Search Hardware for Fully-Parallel Data Processing

Neural Information Processing Systems

Search for the largest (or the smallest) among a number of input data, Le., the winner-take-all (WTA) action, is an essential part of intelligent data processing such as data retrieval in associative memories [3], vector quantization circuits [4], Kohonen's self-organizing maps [5] etc. In addition to the maximum or minimum search, data sorting also plays an essential role in a number of signal processing such as median filtering in image processing, evolutionary algorithms in optimizing problems [6] and so forth.


Constructive Algorithms for Hierarchical Mixtures of Experts

Neural Information Processing Systems

By applying a likelihood splitting criteria to each expert in the HME we "grow" the tree adaptively during training. Secondly, by considering only the most probable path through the tree we may "prune" branches away, either temporarily, or permanently if they become redundant. We demonstrate results for the growing and path pruning algorithms which show significant speed ups and more efficient use of parameters over the standard fixed structure in discriminating between two interlocking spirals and classifying 8-bit parity patterns. INTRODUCTION The HME (Jordan & Jacobs 1994) is a tree structured network whose terminal nodes are simple function approximators in the case of regression or classifiers in the case of classification. The outputs of the terminal nodes or experts are recursively combined upwards towards the root node, to form the overall output of the network, by "gates" which are situated at the non-terminal nodes.


A Realizable Learning Task which Exhibits Overfitting

Neural Information Processing Systems

In this paper we examine a perceptron learning task. The task is realizable since it is provided by another perceptron with identical architecture. Both perceptrons have nonlinear sigmoid output functions. The gain of the output function determines the level of nonlinearity of the learning task. It is observed that a high level of nonlinearity leads to overfitting. We give an explanation for this rather surprising observation and develop a method to avoid the overfitting. This method has two possible interpretations, one is learning with noise, the other cross-validated early stopping.


Temporal Difference Learning in Continuous Time and Space

Neural Information Processing Systems

Elucidation of the relationship between TD learning and dynamic programming (DP) has provided good theoretical insights (Barto et al., 1995). However, conventional TD algorithms were based on discrete-time, discrete-state formulations. In applying these algorithms to control problems, time, space and action had to be appropriately discretized using a priori knowledge or by trial and error. Furthermore, when a TD algorithm is used for neurobiological modeling, discrete-time operation is often very unnatural. There have been several attempts to extend TD-like algorithms to continuous cases. Bradtke et al. (1994) showed convergence results for DPbased algorithms for a discrete-time, continuous-state linear system with a quadratic cost. Bradtke and Duff (1995) derived TD-like algorithms for continuous-time, discrete-state systems (semi-Markov decision problems). Baird (1993) proposed the "advantage updating" algorithm by modifying Q-Iearning so that it works with arbitrary small time steps.


Improving Policies without Measuring Merits

Neural Information Processing Systems

Performing policy iteration in dynamic programming should only require knowledge of relative rather than absolute measures of the utility of actions (Werbos, 1991) - what Baird (1993) calls the ad vantages of actions at states. Nevertheless, most existing methods in dynamic programming (including Baird's) compute some form of absolute utility function. For smooth problems, advantages satisfy two differential consistency conditions (including the requirement that they be free of curl), and we show that enforcing these can lead to appropriate policy improvement solely in terms of advantages.


Stable Fitted Reinforcement Learning

Neural Information Processing Systems

We describe the reinforcement learning problem, motivate algorithms which seek an approximation to the Q function, and present new convergence results for two such algorithms. 1 INTRODUCTION AND BACKGROUND Imagine an agent acting in some environment. At time t, the environment is in some state Xt chosen from a finite set of states. The agent perceives Xt, and is allowed to choose an action at from some finite set of actions. Meanwhile, the agent experiences a real-valued cost Ct, chosen from a distribution which also depends only on Xt and at and which has finite mean and variance. Such an environment is called a Markov decision process, or MDP.


Neural Control for Nonlinear Dynamic Systems

Neural Information Processing Systems

A neural network based approach is presented for controlling two distinct types of nonlinear systems. The first corresponds to nonlinear systems with parametric uncertainties where the parameters occur nonlinearly. The second corresponds to systems for which stabilizing control structures cannot be determined. The proposed neural controllers are shown to result in closed-loop system stability under certain conditions.


Learning Fine Motion by Markov Mixtures of Experts

Neural Information Processing Systems

Brain and Cognitive Sciences Massachussetts Inst. of Technology Massachussetts Inst. of Technology Cambridge, MA 02139 Cambridge, MA 02139 mmp@ai.mit.edu Abstract Compliant control is a standard method for performing fine manipulation tasks, like grasping and assembly, but it requires estimation of the state of contact (s.o.c.) between the robot arm and the objects involved. Here we present a method to learn a model of the movement from measured data. The method requires little or no prior knowledge and the resulting model explicitly estimates the s.o.c. The current s.o.c. is viewed as the hidden state variable of a discrete HMM.


A Dynamical Systems Approach for a Learnable Autonomous Robot

Neural Information Processing Systems

This paper discusses how a robot can learn goal-directed navigation tasks using local sensory inputs. The emphasis is that such learning tasks could be formulated as an embedding problem of dynamical systems: desired trajectories in a task space should be embedded into an adequate sensory-based internal state space so that an unique mapping from the internal state space to the motor command could be established. The paper shows that a recurrent neural network suffices in self-organizing such an adequate internal state space from the temporal sensory input.