Goto

Collaborating Authors

 Information Technology


Constructing Hidden Units using Examples and Queries

Neural Information Processing Systems

While the network loading problem for 2-layer threshold nets is NPhard when learning from examples alone (as with backpropagation), (Baum,91) has now proved that a learner can employ queries to evade the hidden unit credit assignment problem and PACload nets with up to four hidden units in polynomial time. Empirical tests show that the method can also learn far more complicated functions such as randomly generated networks with 200 hidden units. The algorithm easily approximates Wieland's 2-spirals function usinga single layer of 50 hidden units, and requires only 30 minutes of CPU time to learn 200-bit parity to 99.7% accuracy.


Modeling Time Varying Systems Using Hidden Control Neural Architecture

Neural Information Processing Systems

This paper introduces a generalization of the layered neural network that can implement a time-varying nonlinear mapping between its observable input and output. The variation of the network's mapping is due to an additional, hidden control input, while the network parameters remain unchanged. We proposed an algorithm for finding the network parameters and the hidden control sequence from a training set of examples of observable input and output. This algorithm implements an approximate maximum likelihood estimation of parameters of an equivalent statistical model, when only the dominant control sequence is taken into account. The conceptual difference between the proposed model and the HMM is that in the HMM approach, the observable data in each of the states is modeled as though it was produced by a memoryless source, and a parametric description of this source is obtained during training, while in the proposed model the observations in each state are produced by a nonlinear dynamical system driven by noise, and both the parametric form of the dynamics and the noise are estimated. The perfonnance of the model was illustrated for the tasks of nonlinear time-varying system modeling and continuously spoken digit recognition. The reported results show the potential of this model for providing high performance speech recognition capability. Acknowledgment Specialthanks are due to N. Merhav for numerous comments and helpful discussions.



Translating Locative Prepositions

Neural Information Processing Systems

The features used in the spatial representations were abstracted from Herskovits (1986). The network was trained using the generalized delta rule (Rumelhart, Hinton, and Williams, 1986) on a set of patterns with four components, three syntactic and one semantic. The syntactic components are a pair of nouns separated by a locative preposition [NI-LP-N21, and the semantic component is a representation of the spatial relationship [SR1.


Stochastic Neurodynamics

Neural Information Processing Systems

The main point of this paper is that stochastic neural networks have a mathematical structure that corresponds quite closely with that of quantum field theory. Neural network Liouvillians and Lagrangians can be derived, just as can spin Hamiltonians and Lagrangians in QFf. It remains to show the efficacy of such a description.


Distributed Recursive Structure Processing

Neural Information Processing Systems

Harmonic grammar (Legendre, et al., 1990) is a connectionist theory of linguistic well-formed ness based on the assumption that the well-formedness of a sentence can be measured by the harmony (negative energy) of the corresponding connectionist state. Assuming a lower-level connectionist network that obeys a few general connectionist principles but is otherwise unspecified, we construct a higher-level network with an equivalent harmony function that captures the most linguistically relevant global aspects of the lower level network. In this paper, we extend the tensor product representation (Smolensky 1990) to fully recursive representations of recursively structured objects like sentences in the lower-level network. We show theoretically and with an example the power of the new technique for parallel distributed structure processing.


Time Trials on Second-Order and Variable-Learning-Rate Algorithms

Neural Information Processing Systems

The performance of seven minimization algorithms are compared on five neural network problems. These include a variable-step-size algorithm, conjugate gradient, and several methods with explicit analytic or numerical approximations to the Hessian.


Remarks on Interpolation and Recognition Using Neural Nets

Neural Information Processing Systems

We consider different types of single-hidden-Iayer feedforward nets: with or without direct input to output connections, and using either threshold or sigmoidal activation functions. The main results show that direct connections in threshold nets double the recognition but not the interpolation power, while using sigmoids rather than thresholds allows (at least) doubling both. Various results are also given on VC dimension and other measures of recognition capabilities.


Discovering Discrete Distributed Representations with Iterative Competitive Learning

Neural Information Processing Systems

Competitive learning is an unsupervised algorithm that classifies input patterns into mutually exclusive clusters. In a neural net framework, each cluster is represented by a processing unit that competes with others in a winnertake-all pool for an input pattern. I present a simple extension to the algorithm that allows it to construct discrete, distributed representations. Discrete representations are useful because they are relatively easy to analyze and their information content can readily be measured. Distributed representations are useful because they explicitly encode similarity. The basic idea is to apply competitive learning iteratively to an input pattern, and after each stage to subtract from the input pattern the component that was captured in the representation at that stage. This component is simply the weight vector of the winning unit of the competitive pool. The subtraction procedure forces competitive pools at different stages to encode different aspects of the input. The algorithm is essentially the same as a traditional data compression technique known as multistep vector quantization, although the neural net perspective suggests potentially powerful extensions to that approach.


Learning Trajectory and Force Control of an Artificial Muscle Arm by Parallel-hierarchical Neural Network Model

Neural Information Processing Systems

We propose a new parallel-hierarchical neural network model to enable motor learning for simultaneous control of both trajectory and force.