Plotting

 Government


Adjoint-Functions and Temporal Learning Algorithms in Neural Networks

Neural Information Processing Systems

The development of learning algorithms is generally based upon the minimization of an energy function. It is a fundamental requirement to compute the gradient of this energy function with respect to the various parameters of the neural architecture, e.g., synaptic weights, neural gain,etc. In principle, this requires solving a system of nonlinear equations for each parameter of the model, which is computationally very expensive. A new methodology for neural learning of time-dependent nonlinear mappings is presented. It exploits the concept of adjoint operators to enable a fast global computation of the network's response to perturbations in all the systems parameters. The importance of the time boundary conditions of the adjoint functions is discussed. An algorithm is presented in which the adjoint sensitivity equations are solved simultaneously (Le., forward in time) along with the nonlinear dynamics of the neural networks. This methodology makes real-time applications and hardware implementation of temporal learning feasible.


Discrete Affine Wavelet Transforms For Anaylsis And Synthesis Of Feedfoward Neural Networks

Neural Information Processing Systems

In this paper we show that discrete affine wavelet transforms can provide a tool for the analysis and synthesis of standard feedforward neural networks. Itis shown that wavelet frames for L2(IR) can be constructed based upon sigmoids. The spatia-spectral localization property of wavelets can be exploited in defining the topology and determining the weights of a feedforward network. Training a network constructed using the synthesis proceduredescribed here involves minimization of a convex cost functional andtherefore avoids pitfalls inherent in standard backpropagation algorithms. Extension of these methods to L2(IRN) is also discussed. 1 INTRODUCTION Feedforward type neural network models constructed from empirical data have been found to display significant predictive power [6]. Mathematical justification in support ofsuch predictive power may be drawn from various density and approximation theorems [1, 2, 5].


Generalization by Weight-Elimination with Application to Forecasting

Neural Information Processing Systems

Bernardo A. Huberman Dynamics of Computation XeroxPARC Palo Alto, CA 94304 Inspired by the information theoretic idea of minimum description length, we add a term to the back propagation cost function that penalizes network complexity. We give the details of the procedure, called weight-elimination, describe its dynamics, and clarify the meaning of the parameters involved. From a Bayesian perspective, the complexity term can be usefully interpreted as an assumption about prior distribution of the weights. We use this procedure to predict the sunspot time series and the notoriously noisy series of currency exchange rates. 1 INTRODUCTION Learning procedures for connectionist networks are essentially statistical devices for performing inductiveinference. There is a tradeoff between two goals: on the one hand, we want such devices to be as general as possible so that they are able to learn a broad range of problems.


Adjoint-Functions and Temporal Learning Algorithms in Neural Networks

Neural Information Processing Systems

The development of learning algorithms is generally based upon the minimization ofan energy function. It is a fundamental requirement to compute the gradient of this energy function with respect to the various parameters ofthe neural architecture, e.g., synaptic weights, neural gain,etc. In principle, this requires solving a system of nonlinear equations for each parameter of the model, which is computationally very expensive. A new methodology for neural learning of time-dependent nonlinear mappings is presented. It exploits the concept of adjoint operators to enable a fast global computation of the network's response to perturbations in all the systems parameters. The importance of the time boundary conditions of the adjoint functions is discussed. An algorithm is presented in which the adjoint sensitivity equations are solved simultaneously (Le., forward in time) along with the nonlinear dynamics of the neural networks. This methodology makes real-time applications and hardware implementation of temporal learning feasible.


On the Circuit Complexity of Neural Networks

Neural Information Processing Systems

Viewing n-variable boolean functions as vectors in'R'2", we invoke tools from linear algebra and linear programming to derive new results on the realizability of boolean functions using threshold gat.es. Using this approach, one can obtain: (1) upper-bounds on the number of spurious memories in HopfielJ networks, and on the number of functions implementable by a depth-d threshold circuit; (2) a lower bound on the number of ort.hogonal input.



Generalization Properties of Radial Basis Functions

Neural Information Processing Systems

Atkeson Brain and Cognitive Sciences Department and the Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139 Abstract We examine the ability of radial basis functions (RBFs) to generalize. We compare the performance of several types of RBFs. We use the inverse dynamics ofan idealized two-joint arm as a test case. We find that without a proper choice of a norm for the inputs, RBFs have poor generalization properties. A simple global scaling of the input variables greatly improves performance.


Connectionist Approaches to the Use of Markov Models for Speech Recognition

Neural Information Processing Systems

Previous work has shown the ability of Multilayer Perceptrons (MLPs) to estimate emission probabilities for Hidden Markov Models (HMMs). The advantages of a speech recognition system incorporating both MLPs and HMMs are the best discrimination and the ability to incorporate multiple sources of evidence (features, temporal context) without restrictive assumptions of distributions or statistical independence. This paper presents results on the speaker-dependent portion of DARPA's English language Resource Management database. Results support the previously reported utility of MLP probability estimation for continuous speech recognition. An additional approach we are pursuing is to use MLPs as nonlinear predictors for autoregressive HMMs. While this is shown to be more compatible with the HMM formalism, it still suffers from several limitations. This approach is generalized to take account of time correlation between successive observations, without any restrictive assumptions about the driving noise. 1 INTRODUCTION We have been working on continuous speech recognition using moderately large vocabularies (1000 words) [1,2].


Generalization Properties of Radial Basis Functions

Neural Information Processing Systems

Sherif M. Botros Christopher G. Atkeson Brain and Cognitive Sciences Department and the Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139 Abstract We examine the ability of radial basis functions (RBFs) to generalize. We compare the performance of several types of RBFs. We use the inverse dynamics of an idealized two-joint arm as a test case. We find that without a proper choice of a norm for the inputs, RBFs have poor generalization properties. A simple global scaling of the input variables greatly improves performance.


From Speech Recognition to Spoken Language Understanding: The Development of the MIT SUMMIT and VOYAGER Systems

Neural Information Processing Systems

Spoken input to computers, however, has yet to pass the threshold of practicality. Despite some recent successful demonstrations, current speech recognition systems typically fall far short of human capabilities of continuous speech recognition with essentially unrestricted vocabulary and speakers, under adverse acoustic environments.