North America Government
Adjoint-Functions and Temporal Learning Algorithms in Neural Networks
The development of learning algorithms is generally based upon the minimization of an energy function. It is a fundamental requirement to compute the gradient of this energy function with respect to the various parameters of the neural architecture, e.g., synaptic weights, neural gain,etc. In principle, this requires solving a system of nonlinear equations for each parameter of the model, which is computationally very expensive. A new methodology for neural learning of time-dependent nonlinear mappings is presented. It exploits the concept of adjoint operators to enable a fast global computation of the network's response to perturbations in all the systems parameters. The importance of the time boundary conditions of the adjoint functions is discussed. An algorithm is presented in which the adjoint sensitivity equations are solved simultaneously (Le., forward in time) along with the nonlinear dynamics of the neural networks. This methodology makes real-time applications and hardware implementation of temporal learning feasible.
Adjoint-Functions and Temporal Learning Algorithms in Neural Networks
The development of learning algorithms is generally based upon the minimization ofan energy function. It is a fundamental requirement to compute the gradient of this energy function with respect to the various parameters ofthe neural architecture, e.g., synaptic weights, neural gain,etc. In principle, this requires solving a system of nonlinear equations for each parameter of the model, which is computationally very expensive. A new methodology for neural learning of time-dependent nonlinear mappings is presented. It exploits the concept of adjoint operators to enable a fast global computation of the network's response to perturbations in all the systems parameters. The importance of the time boundary conditions of the adjoint functions is discussed. An algorithm is presented in which the adjoint sensitivity equations are solved simultaneously (Le., forward in time) along with the nonlinear dynamics of the neural networks. This methodology makes real-time applications and hardware implementation of temporal learning feasible.
On the Circuit Complexity of Neural Networks
Roychowdhury, V. P., Siu, K. Y., Orlitsky, A., Kailath, T.
Viewing n-variable boolean functions as vectors in'R'2", we invoke tools from linear algebra and linear programming to derive new results on the realizability of boolean functions using threshold gat.es. Using this approach, one can obtain: (1) upper-bounds on the number of spurious memories in HopfielJ networks, and on the number of functions implementable by a depth-d threshold circuit; (2) a lower bound on the number of ort.hogonal input.
Connectionist Approaches to the Use of Markov Models for Speech Recognition
Bourlard, Hervรฉ, Morgan, Nelson, Wooters, Chuck
Previous work has shown the ability of Multilayer Perceptrons (MLPs) to estimate emission probabilities for Hidden Markov Models (HMMs). The advantages of a speech recognition system incorporating both MLPs and HMMs are the best discrimination and the ability to incorporate multiple sources of evidence (features, temporal context) without restrictive assumptions of distributions or statistical independence. This paper presents results on the speaker-dependent portion of DARPA's English language Resource Management database. Results support the previously reported utility of MLP probability estimation for continuous speech recognition. An additional approach we are pursuing is to use MLPs as nonlinear predictors for autoregressive HMMs. While this is shown to be more compatible with the HMM formalism, it still suffers from several limitations. This approach is generalized to take account of time correlation between successive observations, without any restrictive assumptions about the driving noise. 1 INTRODUCTION We have been working on continuous speech recognition using moderately large vocabularies (1000 words) [1,2].
On the Circuit Complexity of Neural Networks
Roychowdhury, V. P., Siu, K. Y., Orlitsky, A., Kailath, T.
Viewing n-variable boolean functions as vectors in'R'2", we invoke tools from linear algebra and linear programming to derive new results on the realizability of boolean functions using threshold gat.es. Using this approach, one can obtain: (1) upper-bounds on the number of spurious memories in HopfielJ networks, and on the number of functions implementable by a depth-d threshold circuit; (2) a lower bound on the number of ort.hogonal input.
Where's the AI?
I survey four viewpoints about what AI is. I describe a program exhibiting AI as one that can change as a result of interactions with the user. Such a program would have to process hundreds or thousands of examples as opposed to a handful. Because AI is a machine's attempt to explain the behavior of the (human) system it is trying to model, the ability of a program design to scale up is critical. Researchers need to face the complexities of scaling up to programs that actually serve a purpose. The move from toy domains into concrete ones has three big consequences for the development of AI. First, it will force software designers to face the idiosyncrasies of its users. Second, it will act as an important reality check between the language of the machine, the software, and the user. Third, the scaled-up programs will become templates for future work. For a variety of reasons, some of which I discuss one of the following four things: (1) AI means in this article, the newly formed Institute magic bullets, (2) AI means inference engines, for the Learning Sciences has been concentrating (3) AI means getting a machine to do something its efforts on building high-quality you didn't think a machine could do educational software for use in business and (the "gee whiz" view), and (4) AI means elementary and secondary schools. In the two having a machine learn.
Enabling Technology for Knowledge Sharing
Neches, Robert, Fikes, Richard E., Finin, Tim, Gruber, Thomas, Patil, Ramesh, Senator, Ted, Swartout, William R.
Building new knowledge-based systems today usually entails constructing new knowledge bases from scratch. It could instead be done by assembling reusable components. System developers would then only need to worry about creating the specialized knowledge and reasoners new to the specific task of their system. This new system would interoperate with existing systems, using them to perform some of its reasoning. In this way, declarative knowledge, problem- solving techniques, and reasoning services could all be shared among systems. This approach would facilitate building bigger and better systems cheaply. The infrastructure to support such sharing and reuse would lead to greater ubiquity of these systems, potentially transforming the knowledge industry. This article presents a vision of the future in which knowledge-based system development and operation is facilitated by infrastructure and technology for knowledge sharing. It describes an initiative currently under way to develop these ideas and suggests steps that must be taken in the future to try to realize this vision.
A Performance Evaluation of Text-Analysis Technologies
Lehnert, Wendy, Sundheim, Beth
A performance evaluation of 15 text-analysis systems was recently conducted to realistically assess the state of the art for detailed information extraction from unconstrained continuous text. Reports associated with terrorism were chosen as the target domain, and all systems were tested on a collection of previously unseen texts released by a government agency. Based on multiple strategies for computing each metric, the competing systems were evaluated for recall, precision, and overgeneration. The results support the claim that systems incorporating natural language-processing techniques are more effective than systems based on stochastic techniques alone. A wide range of language-processing strategies was employed by the top-scoring systems, indicating that many natural language-processing techniques provide a viable foundation for sophisticated text analysis. Further evaluation is needed to produce a more detailed assessment of the relative merits of specific technologies and establish true performance limits for automated information extraction.