Plotting

A Connectionist Symbol Manipulator That Discovers the Structure of Context-Free Languages

Neural Information Processing Systems

We present a neural net architecture that can discover hierarchical and recursive structure in symbol strings. To detect structure at multiple levels, the architecture has the capability of reducing symbols substrings to single symbols, and makes use of an external stack memory. In terms of formal languages, the architecture can learn to parse strings in an LR(O) contextfree grammar. Given training sets of positive and negative exemplars, the architecture has been trained to recognize many different grammars. The architecture has only one layer of modifiable weights, allowing for a straightforward interpretation of its behavior.


Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping

Neural Information Processing Systems

We present a new algorithm, Prioritized Sweeping, for efficient prediction and control of stochastic Markov systems. Incremental learning methods such as Temporal Differencing and Q-Iearning have fast real time performance. Classical methods are slower, but more accurate, because they make full use of the observations. Prioritized Sweeping aims for the best of both worlds. It uses all previous experiences both to prioritize important dynamic programming sweeps and to guide the exploration of statespace. We compare Prioritized Sweeping with other reinforcement learning schemes for a number of different stochastic optimal control problems. It successfully solves large state-space real time problems with which other methods have difficulty.


Nets with Unreliable Hidden Nodes Learn Error-Correcting Codes

Neural Information Processing Systems

In a multi-layered neural network, anyone of the hidden layers can be viewed as computing a distributed representation of the input. Several "encoder" experiments have shown that when the representation space is small it can be fully used. But computing with such a representation requires completely dependable nodes. In the case where the hidden nodes are noisy and unreliable, we find that error correcting schemes emerge simply by using noisy units during training; random errors injected during backpropagation result in spreading representations apart. Average and minimum distances increase with misfire probability, as predicted by coding-theoretic considerations. Furthennore, the effect of this noise is to protect the machine against permanent node failure, thereby potentially extending the useful lifetime of the machine.



Time Warping Invariant Neural Networks

Neural Information Processing Systems

We proposed a model of Time Warping Invariant Neural Networks (TWINN) to handle the time warped continuous signals. Although TWINN is a simple modification of well known recurrent neural network, analysis has shown that TWINN completely removes time warping and is able to handle difficult classification problem. It is also shown that TWINN has certain advantages over the current available sequential processing schemes: Dynamic Programming(DP)[I], Hidden Markov Model( HMM)[2], Time Delayed Neural Networks(TDNN) [3] and Neural Network Finite Automata(NNFA)[4]. We also analyzed the time continuity employed in TWINN and pointed out that this kind of structure can memorize longer input history compared with Neural Network Finite Automata (NNFA). This may help to understand the well accepted fact that for learning grammatical reference with NNF A one had to start with very short strings in training set. The numerical example we used is a trajectory classification problem. This problem, making a feature of variable sampling rates, having internal states, continuous dynamics, heavily time-warped data and deformed phase space trajectories, is shown to be difficult to other schemes. With TWINN this problem has been learned in 100 iterations. For benchmark we also trained the exact same problem with TDNN and completely failed as expected.


Q-Learning with Hidden-Unit Restarting

Neural Information Processing Systems

Platt's resource-allocation network (RAN) (Platt, 1991a, 1991b) is modified for a reinforcement-learning paradigm and to "restart" existing hidden units rather than adding new units. After restarting, units continue to learn via back-propagation. The resulting restart algorithm is tested in a Q-Iearning network that learns to solve an inverted pendulum problem. Solutions are found faster on average with the restart algorithm than without it.


Visual Motion Computation in Analog VLSI Using Pulses

Neural Information Processing Systems

The real time computation of motion from real images using a single chip with integrated sensors is a hard problem. We present two analog VLSI schemes that use pulse domain neuromorphic circuits to compute motion. Pulses of variable width, rather than graded potentials, represent a natural medium for evaluating temporal relationships.


Bayesian Learning via Stochastic Dynamics

Neural Information Processing Systems

The attempt to find a single "optimal" weight vector in conventional network training can lead to overfitting and poor generalization. Bayesian methods avoid this, without the need for a validation set, by averaging the outputs of many networks with weights sampled from the posterior distribution given the training data. This sample can be obtained by simulating a stochastic dynamical system that has the posterior as its stationary distribution.


Intersecting regions: The Key to combinatorial structure in hidden unit space

Neural Information Processing Systems

Hidden units in multi-layer networks form a representation space in which each region can be identified with a class of equivalent outputs (Elman, 1989) or a logical state in a finite state machine (Cleeremans, Servan-Schreiber & McClelland, 1989; Giles, Sun, Chen, Lee, & Chen, 1990). We extend the analysis of the spatial structure of hidden unit space to a combinatorial task, based on binding features together in a visual scene. The logical structure requires a combinatorial number of states to represent all valid scenes. On analysing our networks, we find that the high dimensionality of hidden unit space is exploited by using the intersection of neighboring regions to represent conjunctions of features. These results show how combinatorial structure can be based on the spatial nature of networks, and not just on their emulation of logical structure.


Improving Performance in Neural Networks Using a Boosting Algorithm

Neural Information Processing Systems

A boosting algorithm converts a learning machine with error rate less than 50% to one with an arbitrarily low error rate. However, the algorithm discussed here depends on having a large supply of independent training samples. We show how to circumvent this problem and generate an ensemble of learning machines whose performance in optical character recognition problems is dramatically improved over that of a single network. We report the effect of boosting on four databases (all handwritten) consisting of 12,000 digits from segmented ZIP codes from the United State Postal Service (USPS) and the following from the National Institute of Standards and Testing (NIST): 220,000 digits, 45,000 upper case alphas, and 45,000 lower case alphas. We use two performance measures: the raw error rate (no rejects) and the reject rate required to achieve a 1% error rate on the patterns not rejected.