Not enough data to create a plot.
Try a different view from the menu above.
Information Technology
Learning Temporal Dependencies in Connectionist Speech Recognition
Renals, Steve, Hochberg, Mike, Robinson, Tony
In this paper, we discuss the nature of the time dependence currently employed in our systems using recurrent networks (RNs) and feed-forward multi-layer perceptrons (MLPs). In particular, we introduce local recurrences into a MLP to produce an enhanced input representation. This is in the form of an adaptive gamma filter and incorporates an automatic approach for learning temporal dependencies. We have experimented on a speakerindependent phone recognition task using the TIMIT database. Results using the gamma filtered input representation have shown improvement over the baseline MLP system. Improvements have also been obtained through merging the baseline and gamma filter models.
Use of Bad Training Data for Better Predictions
We show how randomly scrambling the output classes of various fractions of the training data may be used to improve predictive accuracy of a classification algorithm. We present a method for calculating the "noise sensitivity signature" of a learning algorithm which is based on scrambling the output classes. This signature can be used to indicate a good match between the complexity of the classifier and the complexity of the data. Use of noise sensitivity signatures is distinctly different from other schemes to avoid overtraining, such as cross-validation, which uses only part of the training data, or various penalty functions, which are not data-adaptive. Noise sensitivity signature methods use all of the training data and are manifestly data-adaptive and nonparametric. They are well suited for situations with limited training data. 1 INTRODUCTION A major problem of pattern recognition and classification algorithms that learn from a training set of examples is to select the complexity of the model to be trained. How is it possible to avoid an overparameterized algorithm from "memorizing" the training data?
The "Softmax" Nonlinearity: Derivation Using Statistical Mechanics and Useful Properties as a Multiterminal Analog Circuit Element
Elfadel, I. M., J. L. Wyatt, Jr.
In this paper, we show a reciprocal implementation of the "softmax" nonlinearity that is usually used to enforce local competition between neurons [Peterson, 1989]. We show that the circuit is passive and incrementally passive, and we explicitly compute its content and co-content functions. This circuit adds a new element to the library of the analog circuit designer that can be combined with reciprocal constraint boxes [Harris, 1988] and nonlinear resistive fuses [Harris, 1989] to form fast, analog VLSI optimization networks.
Postal Address Block Location Using a Convolutional Locator Network
This paper describes the use of a convolutional neural network to perform address block location on machine-printed mail pieces. Locating the address block is a difficult object recognition problem because there is often a large amount of extraneous printing on a mail piece and because address blocks vary dramatically in size and shape. We used a convolutional locator network with four outputs, each trained to find a different corner of the address block. A simple set of rules was used to generate ABL candidates from the network output. The system performs very well: when allowed five guesses, the network will tightly bound the address delivery information in 98.2% of the cases.
Fool's Gold: Extracting Finite State Machines from Recurrent Network Dynamics
Several recurrent networks have been proposed as representations for the task of formal language learning. After training a recurrent network recognize a formal language or predict the next symbol of a sequence, the next logical step is to understand the information processing carried out by the network. Some researchers have begun to extracting finite state machines from the internal state trajectories of their recurrent networks. This paper describes how sensitivity to initial conditions and discrete measurements can trick these extraction methods to return illusory finite state descriptions.
Training Neural Networks with Deficient Data
Tresp, Volker, Ahmad, Subutai, Neuneier, Ralph
We analyze how data with uncertain or missing input features can be incorporated into the training of a neural network. The general solution requires a weighted integration over the unknown or uncertain input although computationally cheaper closed-form solutions can be found for certain Gaussian Basis Function (GBF) networks. We also discuss cases in which heuristical solutions such as substituting the mean of an unknown input can be harmful.
The Power of Amnesia
Ron, Dana, Singer, Yoram, Tishby, Naftali
We propose a learning algorithm for a variable memory length Markov process. Human communication, whether given as text, handwriting, or speech, has multi characteristic time scales. On short scales it is characterized mostly by the dynamics that generate the process, whereas on large scales, more syntactic and semantic information is carried. For that reason the conventionally used fixed memory Markov models cannot capture effectively the complexity of such structures. On the other hand using long memory models uniformly is not practical even for as short memory as four.