Structural and Behavioral Evolution of Recurrent Networks
Saunders, Gregory M., Angeline, Peter J., Pollack, Jordan B.
This paper introduces GNARL, an evolutionary program which induces recurrent neural networks that are structurally unconstrained. In contrast to constructive and destructive algorithms, GNARL employs a population ofnetworks and uses a fitness function's unsupervised feedback to guide search through network space. Annealing is used in generating both gaussian weight changes and structural modifications. Applying GNARL to a complex search and collection task demonstrates that the system is capable of inducing networks with complex internal dynamics.
Unsupervised Learning of Mixtures of Multiple Causes in Binary Data
This paper presents a formulation for unsupervised learning of clusters reflectingmultiple causal structure in binary data. Unlike the standard mixture model, a multiple cause model accounts for observed databy combining assertions from many hidden causes, each of which can pertain to varying degree to any subset of the observable dimensions.A crucial issue is the mixing-function for combining beliefs from different cluster-centers in order to generate data reconstructions whose errors are minimized both during recognition and learning. We demonstrate a weakness inherent to the popular weighted sum followed by sigmoid squashing, and offer an alternative formof the nonlinearity. Results are presented demonstrating the algorithm's ability successfully to discover coherent multiple causal representat.ions of noisy test data and in images of printed characters. 1 Introduction The objective of unsupervised learning is to identify patterns or features reflecting underlying regularities in data. Single-cause techniques, including the k-means algorithm andthe standard mixture-model (Duda and Hart, 1973), represent clusters of data points sharing similar patterns of Is and Os under the assumption that each data point belongs to, or was generated by, one and only one cluster-center; output activity is constrained to sum to 1. In contrast, a multiple-cause model permits more than one cluster-center to become fully active in accounting for an observed data vector. The advantage of a multiple cause model is that a relatively small number 27 28 Saund of hidden variables can be applied combinatorially to generate a large data set.
Analysis of Short Term Memories for Neural Networks
Principe, Jose C., Hsu, Hui-H., Kuo, Jyh-Ming
Short term memory is indispensable for the processing of time varying information with artificial neural networks. In this paper a model for linear memories is presented, and ways to include memories in connectionist topologies are discussed. A comparison is drawn among different memory types, with indication of what is the salient characteristic of each memory model. 1 INTRODUCTION An adaptive system that has to interact with the external world is faced with the problem of coping with the time varying nature of real world signals. Time varying signals, natural or man made, carry information in their time structure. The problem is then one of devising methods and topologies (in the case of interest here, neural topologies) that explore information along time.This problem can be appropriately called temporal pattern recognition, as opposed to the more traditional case of static pattern recognition.
Central and Pairwise Data Clustering by Competitive Neural Networks
Buhmann, Joachim, Hofmann, Thomas
Data clustering amounts to a combinatorial optimization problem to reduce thecomplexity of a data representation and to increase its precision. Central and pairwise data clustering are studied in the maximum entropy framework.For central clustering we derive a set of reestimation equations and a minimization procedure which yields an optimal number ofclusters, their centers and their cluster probabilities. A meanfield approximation for pairwise clustering is used to estimate assignment probabilities. A se1fconsistent solution to multidimensional scaling and pairwise clustering is derived which yields an optimal embedding and clustering of data points in a d-dimensional Euclidian space. 1 Introduction A central problem in information processing is the reduction of the data complexity with minimal loss in precision to discard noise and to reveal basic structure of data sets. Data clustering addresses this tradeoff by optimizing a cost function which preserves the original data as complete as possible and which simultaneously favors prototypes with minimal complexity (Linde et aI., 1980; Gray, 1984; Chou et aI., 1989; Rose et ai., 1990). We discuss anobjective function for the joint optimization of distortion errors and the complexity of a reduced data representation.
Learning Temporal Dependencies in Connectionist Speech Recognition
Renals, Steve, Hochberg, Mike, Robinson, Tony
In this paper, we discuss the nature of the time dependence currently employed in our systems using recurrent networks (RNs) and feed-forward multi-layer perceptrons (MLPs). In particular, we introduce local recurrences into a MLP to produce an enhanced input representation. This is in the form of an adaptive gamma filter and incorporates an automatic approach for learning temporal dependencies. We have experimented on a speakerindependent phonerecognition task using the TIMIT database. Results using the gamma filtered input representation have shown improvement over the baseline MLP system. Improvements have also been obtained through merging the baseline and gamma filter models.
Non-Linear Statistical Analysis and Self-Organizing Hebbian Networks
Shapiro, Jonathan L., Prügel-Bennett, Adam
Linear neurons learning under an unsupervised Hebbian rule can learn to perform a linear statistical analysis ofthe input data. This was first shown by Oja (1982), who proposed a learning rule which finds the first principal component of the variance matrix of the input data. Based on this model, Oja (1989), Sanger (1989), and many others have devised numerous neural networks which find many components of this matrix. These networks perform principal component analysis (PCA), a well-known method of statistical analysis.
Hidden Markov Models for Human Genes
Baldi, Pierre, Brunak, Søren, Chauvin, Yves, Engelbrecht, Jacob, Krogh, Anders
We apply HMMs to the problem of modeling exons, intronsand detecting splice sites in the human genome. Our most interesting result so far is the detection of particular oscillatory patterns,with a minimal period ofroughly 10 nucleotides, that seem to be characteristic of exon regions and may have significant biological implications.
An Analog VLSI Model of Central Pattern Generation in the Leech
The biological network is small and relatively well understood, and the silicon model can therefore span three levels of organization in the leech nervous system (neuron, ganglion, system); it represents one of the first comprehensive models of leech swimming operating in real-time. The circuit employs biophysically motivated analog neurons networked to form multiple biologically inspired silicon ganglia. These ganglia are coupled using known interganglionic connections. Thus the model retains the flavor of its biological counterpart, and though simplified, the output of the silicon circuit is similar to the output of the leech swim central pattern generator. The model operates on the same time-and spatial-scale as the leech nervous system and will provide an excellent platform with which to explore real-time adaptive locomotion in the leech and other "simple" invertebrate nervous systems.
Discontinuous Generalization in Large Committee Machines
H. Schwarze Dept. of Theoretical Physics Lund University Solvegatan 14A 223 62 Lund Sweden J.Hertz Nordita Blegdamsvej 17 2100 Copenhagen 0 Denmark Abstract The problem of learning from examples in multilayer networks is studied within the framework of statistical mechanics. Using the replica formalism we calculate the average generalization error of a fully connected committee machine in the limit of a large number of hidden units. If the number of training examples is proportional to the number of inputs in the network, the generalization error as a function of the training set size approaches a finite value. If the number of training examples is proportional to the number of weights in the network we find first-order phase transitions with a discontinuous drop in the generalization error for both binary and continuous weights. 1 INTRODUCTION Feedforward neural networks are widely used as nonlinear, parametric models for the solution of classification tasks and function approximation. Trained from examples of a given task, they are able to generalize, i.e. to compute the correct output for new, unknown inputs.
Counting function theorem for multi-layer networks
If N hin then such a perceptron must have all units of the first hidden layer fully connected to inputs. This implies the maximal capacities (in the sense of Cover) of 2n input patterns per hidden unit and 2 input patterns per synaptic weight of such networks (both capacities are achieved by networks with single hidden layer and are the same as for a single neuron). Comparing these results with recent estimates of VC-dimension we find that in contrast to the single neuron case, for sufficiently large nand hl, the VC-dimension exceeds Cover's capacity. 1 Introduction In the course of theoretical justification of many of the claims made about neural networks regarding their ability to learn a set of patterns and their ability to generalise, variousconcepts of maximal storage capacity were developed. In particular Cover's capacity [4] and VC-dimension [12] are two expressions of this notion and are of special interest here. We should stress that both capacities are not easy to compute and are presen tly known in a few particular cases of feedforward networks only.VC-dimension, in spite of being introduced much later, has been far 375 376 Kowalczyk more researched, perhaps due to its significance expressed by a well known relation between generalisation and learning errors [12, 3].