Goto

Collaborating Authors

 Country


Unsupervised Parallel Feature Extraction from First Principles

Neural Information Processing Systems

EE., Linkoping University S-58183 Linkoping Sweden Abstract We describe a number of learning rules that can be used to train unsupervised parallelfeature extraction systems. The learning rules are derived using gradient ascent of a quality function. We consider anumber of quality functions that are rational functions of higher order moments of the extracted feature values. We show that one system learns the principle components of the correlation matrix.Principal component analysis systems are usually not optimal feature extractors for classification. Therefore we design quality functions which produce feature vectors that support unsupervised classification.The properties of the different systems are compared with the help of different artificially designed datasets and a database consisting of all Munsell color spectra. 1 Introduction There are a number of unsupervised Hebbian learning algorithms (see Oja, 1992 and references therein) that perform some version of the Karhunen-Loeve expansion.


Training Neural Networks with Deficient Data

Neural Information Processing Systems

We analyze how data with uncertain or missing input features can be incorporated into the training of a neural network. The general solutionrequires a weighted integration over the unknown or uncertain input although computationally cheaper closed-form solutions canbe found for certain Gaussian Basis Function (GBF) networks. We also discuss cases in which heuristical solutions such as substituting the mean of an unknown input can be harmful.


Supervised learning from incomplete data via an EM approach

Neural Information Processing Systems

Real-world learning tasks may involve high-dimensional data sets with arbitrary patterns of missing data. In this paper we present a framework based on maximum likelihood density estimation for learning from such data set.s. VVe use mixture models for the density estimatesand make two distinct appeals to the Expectation Maximization (EM) principle (Dempster et al., 1977) in deriving a learning algorithm-EM is used both for the estimation of mixture componentsand for coping wit.h missing dat.a. The resulting algorithm is applicable t.o a wide range of supervised as well as unsupervised learning problems.


Central and Pairwise Data Clustering by Competitive Neural Networks

Neural Information Processing Systems

Data clustering amounts to a combinatorial optimization problem to reduce thecomplexity of a data representation and to increase its precision. Central and pairwise data clustering are studied in the maximum entropy framework.For central clustering we derive a set of reestimation equations and a minimization procedure which yields an optimal number ofclusters, their centers and their cluster probabilities. A meanfield approximation for pairwise clustering is used to estimate assignment probabilities. A se1fconsistent solution to multidimensional scaling and pairwise clustering is derived which yields an optimal embedding and clustering of data points in a d-dimensional Euclidian space. 1 Introduction A central problem in information processing is the reduction of the data complexity with minimal loss in precision to discard noise and to reveal basic structure of data sets. Data clustering addresses this tradeoff by optimizing a cost function which preserves the original data as complete as possible and which simultaneously favors prototypes with minimal complexity (Linde et aI., 1980; Gray, 1984; Chou et aI., 1989; Rose et ai., 1990). We discuss anobjective function for the joint optimization of distortion errors and the complexity of a reduced data representation.


Structural and Behavioral Evolution of Recurrent Networks

Neural Information Processing Systems

This paper introduces GNARL, an evolutionary program which induces recurrent neural networks that are structurally unconstrained. In contrast to constructive and destructive algorithms, GNARL employs a population ofnetworks and uses a fitness function's unsupervised feedback to guide search through network space. Annealing is used in generating both gaussian weight changes and structural modifications. Applying GNARL to a complex search and collection task demonstrates that the system is capable of inducing networks with complex internal dynamics.


Grammatical Inference by Attentional Control of Synchronization in an Oscillating Elman Network

Neural Information Processing Systems

We show how an "Elman" network architecture, constructed from recurrently connected oscillatory associative memory network modules, canemploy selective "attentional" control of synchronization to direct the flow of communication and computation within the architecture to solve a grammatical inference problem. Previously we have shown how the discrete time "Elman" network algorithm can be implemented in a network completely described by continuous ordinary differential equations. The time steps (machine cycles)of the system are implemented by rhythmic variation (clocking) of a bifurcation parameter. In this architecture, oscillation amplitudecodes the information content or activity of a module (unit), whereas phase and frequency are used to "softwire" the network. Only synchronized modules communicate by exchanging amplitudeinformation; the activity of non-resonating modules contributes incoherent crosstalk noise. Attentional control is modeled as a special subset of the hidden modules with ouputs which affect the resonant frequencies of other hidden modules. They control synchrony among the other modules anddirect the flow of computation (attention) to effect transitions betweentwo subgraphs of a thirteen state automaton which the system emulates to generate a Reber grammar. The internal crosstalk noise is used to drive the required random transitions of the automaton.



When will a Genetic Algorithm Outperform Hill Climbing

Neural Information Processing Systems

HoUand Dept. of Psychology University of Michigan Ann Arbor, MI 48109 StephanieForrest Dept. of Computer Science University of New Mexico Albuquerque, NM 87131 Abstract We analyze a simple hill-climbing algorithm (RMHC) that was previously shownto outperform a genetic algorithm (GA) on a simple "Royal Road" function. We then analyze an "idealized" genetic algorithm (IGA) that is significantly faster than RMHC and that gives a lower bound for GA speed. We identify the features of the IGA that give rise to this speedup, and discuss how these features can be incorporated into a real GA. 1 INTRODUCTION Our goal is to understand the class of problems for which genetic algorithms (GA) are most suited, and in particular, for which they will outperform other search algorithms. Several studies have empirically compared GAs with other search and optimization methods such as simple hill-climbing (e.g., Davis, 1991), simulated annealing (e.g., Ingber & Rosen, 1992), linear, nonlinear, and integer programming techniques, and other traditional optimization techniques (e.g., De Jong, 1975). However, such comparisons typically compare one version of the GA with a second algorithm on a single problem or set of problems, often using performance criteria which may not be appropriate.



Fast Pruning Using Principal Components

Neural Information Processing Systems

In this procedure one transforms variables to a basis in which the covariance isdiagonal and then projects out the low variance directions. While application of PCA to remove input variables is useful in some cases (Leen et al., 1990), there is no guarantee that low variance variables have little effect on error. We propose a saliency measure, based on PCA, that identifies those variables that have the least effect on error. Our proposed Principal Components Pruning algorithm applies this measure to obtain a simple and cheap pruning technique in the context of supervised learning. Fast Pruning Using Principal Components 37 Special Case: PCP in Linear Regression In unbiased linear models, one can bound the bias introduced from pruning the principal degrees of freedom in the model.