Goto

Collaborating Authors

 Country


On-line Learning of Dichotomies

Neural Information Processing Systems

The performance of online algorithms for learning dichotomies is studied. In online learning, the number of examples P is equivalent to the learning time, since each example is presented only once. The learning curve, or generalization error as a function of P, depends on the schedule at which the learning rate is lowered.


Using Voice Transformations to Create Additional Training Talkers for Word Spotting

Neural Information Processing Systems

Lack of training data has always been a constraint in training speech recognizers. This research presents a voice transformation technique which increases the variety among training talkers. The resulting more varied training set provided up to 2.9 percentage points of improvement in the figure of merit (average detection rate) of a high performance word spotter. This improvement is similar to the increase in performance provided by doubling the amount of training data (Carlson, 1994). This technique can also be applied to other speech recognition systems such as continuous speech recognition, talker identification, and isolated speech recognition.


Estimating Conditional Probability Densities for Periodic Variables

Neural Information Processing Systems

Many applications of neural networks can be formulated in terms of a multivariate nonlinear mapping from an input vector x to a target vector t. A conventional neural network approach, based on least squares for example, leads to a network mapping which approximates the regression of t on x. A more complete description of the data can be obtained by estimating the conditional probability density of t, conditioned on x, which we write as p(tlx). Various techniques exist for modelling such densities when the target variables live in a Euclidean space. However, a number of potential applications involve angle-like output variables which are periodic on some finite interval (usually chosen to be (0,271")).


Interference in Learning Internal Models of Inverse Dynamics in Humans

Neural Information Processing Systems

Experiments were performed to reveal some of the computational properties of the human motor memory system. We show that as humans practice reaching movements while interacting with a novel mechanical environment, they learn an internal model of the inverse dynamics of that environment. The representation of the internal model in memory is such that there is interference when there is an attempt to learn a new inverse dynamics map immediately after an anticorrelated mapping was learned. We suggest that this interference is an indication that the same computational elements used to encode the first inverse dynamics map are being used to learn the second mapping. We predict that this leads to a forgetting of the initially learned skill. 1 Introduction In tasks where we use our hands to interact with a tool, our motor system develops a model of the dynamics of that tool and uses this model to control the coupled dynamics of our arm and the tool (Shadmehr and Mussa-Ivaldi 1994). In physical systems theory, the tool is a mechanical analogue of an admittance, mapping a force as input onto a change in state as output (Hogan 1985).


Learning with Preknowledge: Clustering with Point and Graph Matching Distance Measures

Neural Information Processing Systems

Recently, the importance of such preknowledge for learning has been convincingly argued from a statistical framework [Geman et al., 1992]. Researchers have proposed that our brains may incorporate preknowledge in the form of distance measures [Shepard, 1989]. The neural network community has begun to explore this idea via tangent distance [Simard et al., 1993], model learning [Williams et al., 1993] and point matching distances [Gold et al., 1994]. However, only the point matching distances have been invariant under permutations. Here we extend that work by enhancing both the scope and function of those distance measures, significantly expanding the problem domains where learning may take place. We learn objects consisting of noisy 2-D point-sets or noisy weighted graphs by clustering with point matching and graph matching distance measures. The point matching measure is approx.


Temporal Dynamics of Generalization in Neural Networks

Neural Information Processing Systems

This paper presents a rigorous characterization of how a general nonlinear learning machine generalizes during the training process when it is trained on a random sample using a gradient descent algorithm based on reduction of training error. It is shown, in particular, that best generalization performance occurs, in general, before the global minimum of the training error is achieved. The different roles played by the complexity of the machine class and the complexity of the specific machine in the class during learning are also precisely demarcated. 1 INTRODUCTION In learning machines such as neural networks, two major factors that affect the'goodness of fit' of the examples are network size (complexity) and training time. These are also the major factors that affect the generalization performance of the network. Many theoretical studies exploring the relation between generalization performance and machine complexity support the parsimony heuristics suggested by Occam's razor, to wit that amongst machines with similar training performance one should opt for the machine of least complexity.


Patterns of damage in neural networks: The effects of lesion area, shape and number

Neural Information Processing Systems

Current understanding of the effects of damage on neural networks is rudimentary, even though such understanding could lead to important insights concerning neurological and psychiatric disorders. Motivated by this consideration, we present a simple analytical framework for estimating the functional damage resulting from focal structural lesions to a neural network.


Hierarchical Mixtures of Experts Methodology Applied to Continuous Speech Recognition

Neural Information Processing Systems

In this paper, we incorporate the Hierarchical Mixtures of Experts (HME) method of probability estimation, developed by Jordan [1], into an HMMbased continuous speech recognition system. The resulting system can be thought of as a continuous-density HMM system, but instead of using gaussian mixtures, the HME system employs a large set of hierarchically organized but relatively small neural networks to perform the probability density estimation. The hierarchical structure is reminiscent of a decision tree except for two important differences: each "expert" or neural net performs a "soft" decision rather than a hard decision, and, unlike ordinary decision trees, the parameters of all the neural nets in the HME are automatically trainable using the EM algorithm. We report results on the ARPA 5,OOO-word and 4O,OOO-word Wall Street Journal corpus using HME models. 1 Introduction Recent research has shown that a continuous-density HMM (CD-HMM) system can outperform a more constrained tied-mixture HMM system for large-vocabulary continuous speech recognition (CSR) when a large amount of training data is available [2]. In other work, the utility of decision trees has been demonstrated in classification problems by using the "divide and conquer" paradigm effectively, where a problem is divided into a hierarchical set of simpler problems.


Dynamic Modelling of Chaotic Time Series with Neural Networks

Neural Information Processing Systems

In young barn owls raised with optical prisms over their eyes, these auditory maps are shifted to stay in register with the visual map, suggesting that the visual input imposes a frame of reference on the auditory maps. However, the optic tectum, the first site of convergence of visual with auditory information, is not the site of plasticity for the shift of the auditory maps; the plasticity occurs instead in the inferior colliculus, which contains an auditory map and projects into the optic tectum. We explored a model of the owl remapping in which a global reinforcement signal whose delivery is controlled by visual foveation. A hebb learning rule gated by reinforcement learned to appropriately adjust auditory maps. In addition, reinforcement learning preferentially adjusted the weights in the inferior colliculus, as in the owl brain, even though the weights were allowed to change throughout the auditory system. This observation raises the possibility that the site of learning does not have to be genetically specified, but could be determined by how the learning procedure interacts with the network architecture.


Grammar Learning by a Self-Organizing Network

Neural Information Processing Systems

Michiro Negishi Dept. of Cognitive and Neural Systems, Boston University 111 Cummington Street Boston, MA 02215 email: negishi@cns.bu.edu Abstract This paper presents the design and simulation results of a selforganizing neural network which induces a grammar from example sentences. Input sentences are generated from a simple phrase structure grammar including number agreement, verb transitivity, and recursive noun phrase construction rules. The network induces a grammar explicitly in the form of symbol categorization rules and phrase structure rules. 1 Purpose and related works The purpose of this research is to show that a self-organizing network with a certain structure can acquire syntactic knowledge from only positive (i.e. There has been research on supervised neural network models of language acquisition tasks [Elman, 1991, Miikkulainen and Dyer, 1988, John and McClelland, 1988]. Unlike these supervised models, the current model self-organizes word and phrasal categories and phrase construction rules through mere exposure to input sentences, without any artificially defined task goals.