Country
Multidimensional Scaling and Data Clustering
Hofmann, Thomas, Buhmann, Joachim
Visualizing and structuring pairwise dissimilarity data are difficult combinatorial optimization problemsknown as multidimensional scaling or pairwise data clustering. Algorithms for embedding dissimilarity data set in a Euclidian space, for clustering these data and for actively selecting data to support the clustering process are discussed in the maximum entropy framework. Active data selection provides a strategy to discover structure in a data set efficiently with partially unknown data. 1 Introduction Grouping experimental data into compact clusters arises as a data analysis problem in psychology, linguistics,genetics and other experimental sciences. The data which are supposed to be clustered are either given by an explicit coordinate representation (central clustering) or, in the non-metric case, they are characterized by dissimilarity values for pairs of data points (pairwise clustering). In this paper we study algorithms (i) for embedding non-metric data in a D-dimensional Euclidian space, (ii) for simultaneous clustering and embedding of non-metric data, and (iii) for active data selection to determine a particular cluster structure with minimal number of data queries. All algorithms are derived from the maximum entropy principle (Hertz et al., 1991) which guarantees robust statistics (Tikochinsky et al., 1984).
Glove-TalkII: Mapping Hand Gestures to Speech Using Neural Networks
Fels, Sidney, Hinton, Geoffrey E.
There are many different possible schemes for converting hand gestures to speech. The choice of scheme depends on the granularity of the speech that you want to produce. Figure 1 identifies a spectrum defined by possible divisions of speech based on the duration of the sound for each granularity. What is interesting is that in general, the coarser the division of speech, the smaller the bandwidth necessary for the user. In contrast, where the granularity of speech is on the order of articulatory musclemovements (i.e. the artificial vocal tract [AVT]) high bandwidth control is necessary for good speech. Devices which implement this model of speech production are like musical instruments which produce speech sounds.
JPMAX: Learning to Recognize Moving Objects as a Model-fitting Problem
Suzanna Becker Department of Psychology, McMaster University Hamilton, Onto L8S 4K1 Abstract Unsupervised learning procedures have been successful at low-level feature extraction and preprocessing of raw sensor data. So far, however, they have had limited success in learning higher-order representations, e.g., of objects in visual images. A promising approach isto maximize some measure of agreement between the outputs of two groups of units which receive inputs physically separated inspace, time or modality, as in (Becker and Hinton, 1992; Becker, 1993; de Sa, 1993). Using the same approach, a much simpler learningprocedure is proposed here which discovers features in a single-layer network consisting of several populations of units, and can be applied to multi-layer networks trained one layer at a time. When trained with this algorithm on image sequences of moving geometric objects a two-layer network can learn to perform accurate position-invariant object classification. 1 LEARNING COHERENT CLASSIFICATIONS A powerful constraint in sensory data is coherence over time, in space, and across different sensory modalities.
A Study of Parallel Perturbative Gradient Descent
Motivated by difficulties in analog VLSI implementation of back-propagation [Rumelhart et al., 1986] and related algorithms that calculate gradients based on detailed knowledge of the neural network model, there were several similar recent papersproposing to use a parallel [Alspector et al., 1993, Cauwenberghs, 1993, Kirk et al., 1993] or a semi-parallel [Flower and Jabri, 1993] perturbative technique which has the property that it measures (with the physical neural network) rather than calculates the gradient. This technique is closely related to methods of stochastic approximation[Kushner and Clark, 1978] which have been investigated recently by workers in fields other than neural networks.
An Analog Neural Network Inspired by Fractal Block Coding
Pineda, Fernando J., Andreou, Andreas G.
We consider the problem of decoding block coded data, using a physical dynamical system. We sketch out a decompression algorithm for fractal block codes and then show how to implement a recurrent neural network using physically simple but highly-nonlinear, analog circuit models of neurons and synapses. The nonlinear system has many fixed points, but we have at our disposal a procedure to choose the parameters in such a way that only one solution, the desired solution, is stable. As a partial proof of the concept, we present experimental data from a small system a 16-neuron analog CMOS chip fabricated in a 2m analog p-well process. This chip operates in the subthreshold regime and, for each choice of parameters, converges to a unique stable state. Each state exhibits a qualitatively fractal shape.
An Input Output HMM Architecture
Bengio, Yoshua, Frasconi, Paolo
We introduce a recurrent architecture having a modular structure and we formulate a training procedure based on the EM algorithm. The resulting model has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation. 1 INTRODUCTION Learning problems involving sequentially structured data cannot be effectively dealt with static models such as feedforward networks. Recurrent networks allow to model complex dynamical systems and can store and retrieve contextual information in a flexible way. Up until the present time, research efforts of supervised learning for recurrent networks have almost exclusively focused on error minimization by gradient descent methods. Although effective for learning short term memories, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals (Bengio et al., 1994; Mozer, 1992).
Stochastic Dynamics of Three-State Neural Networks
We present here an analysis of the stochastic neurodynamics of a neural network composed of three-state neurons described by a master equation. An outer-product representation of the master equationis employed. In this representation, an extension of the analysis from two to three-state neurons is easily performed. We apply this formalism with approximation schemes to a simple three-statenetwork and compare the results with Monte Carlo simulations.
On-line Learning of Dichotomies
Barkai, N., Seung, H. S., Sompolinsky, H.
The performance of online algorithms for learning dichotomies is studied. In online learning, thenumber of examples P is equivalent to the learning time, since each example is presented only once. The learning curve, or generalization error as a function of P, depends on the schedule at which the learning rate is lowered. For a target that is a perceptron rule, the learning curve of the perceptron algorithm can decrease as fast as p-1,if the schedule is optimized. If the target is not realizable by a perceptron, the perceptron algorithm does not generally converge to the solution with lowest generalization error.
Capacity and Information Efficiency of a Brain-like Associative Net
Graham, Bruce, Willshaw, David
Bruce Graham and David Willshaw Centre for Cognitive Science, University of Edinburgh 2 Buccleuch Place, Edinburgh, EH8 9LW, UK Email: bruce@cns.ed.ac.uk&david@cns.ed.ac.uk Abstract We have determined the capacity and information efficiency of an associative net configured in a brain-like way with partial connectivity andnoisy input cues. Recall theory was used to calculate the capacity when pattern recall is achieved using a winners-takeall strategy.Transforming the dendritic sum according to input activity and unit usage can greatly increase the capacity of the associative net under these conditions. This corresponds to the level of connectivity commonly seen in the brain and invites speculation that the brain is connected in the most information efficient way. 1 INTRODUCTION Standard network associative memories become more plausible as models of associative memoryin the brain if they incorporate (1) partial connectivity, (2) sparse activity and (3) recall from noisy cues. In this paper we consider the capacity of a binary associative net (Willshaw, Buneman, & Longuet-Higgins, 1969; Willshaw, 1971; Buckingham, 1991) containing these features. While the associative net is a very simple model of associative memory, its behaviour as a storage device is not trivial and yet it is tractable to theoretical analysis.