Goto

Collaborating Authors

 Statistical Learning



Optimization by Mean Field Annealing

Neural Information Processing Systems

Nearly optimal solutions to many combinatorial problems can be found using stochastic simulated annealing. This paper extends the concept of simulated annealing from its original formulation as a Markov process to a new formulation based on mean field theory. Mean field annealing essentially replaces the discrete degrees offreedom in simulated annealing with their average values as computed by the mean field approximation. The net result is that equilibrium at a given temperature is achieved 1-2 orders of magnitude faster than with simulated annealing. A general framework forthe mean field annealing algorithm is derived, and its relationship toHopfield networks is shown. The behavior of MFA is examined both analytically and experimentally for a generic combinatorial optimizationproblem: graph bipartitioning. This analysis indicates the presence of critical temperatures which could be important inimproving the performance of neural networks.


Fast Learning in Multi-Resolution Hierarchies

Neural Information Processing Systems

A variety of approaches to adaptive information processing have been developed by workers in disparate disciplines. These include the large body of literature on approximation and interpolation techniques (curve and surface fitting), the linear, real-time adaptive signal processing systems (such as the adaptive linear combiner and the Kalman filter), and most recently, the reincarnation of nonlinear neural network models such as the multilayer perceptron. Each of these methods has its strengths and weaknesses. The curve and surface fitting techniques are excellent for off-line data analysis, but are typically not formulated withreal-time applications in mind. The linear techniques of adaptive signal processing and adaptive control are well-characterized, but are limited to applications forwhich linear descriptions are appropriate. Finally, neural network learning models such as back propagation have proven extremely versatile at learning a wide variety of nonlinear mappings, but tend to be very slow computationally and are not yet well characterized.


Constraints on Adaptive Networks for Modeling Human Generalization

Neural Information Processing Systems

CA 94305 ABSTRACT The potential of adaptive networks to learn categorization rules and to model human performance is studied by comparing how natural and artificial systems respond to new inputs, i.e., how they generalize. Like humans, networks can learn a detenninistic categorization task by a variety of alternative individual solutions. An analysis of the constraints imposedby using networks with the minimal number of hidden units shows that this "minimal configuration" constraint is not sufficient A further analysis of human and network generalizations indicates that initial conditions may provide important constraints on generalization. A new technique, which we call "reversed learning", is described for finding appropriate initial conditions. INTRODUCTION We are investigating the potential of adaptive networks to learn categorization tasks and to model human performance.



Neural Net and Traditional Classifiers

Neural Information Processing Systems

Previous work on nets with continuous-valued inputs led to generative procedures to construct convex decision regions with two-layer perceptrons (one hidden layer) and arbitrary decision regions with three-layer perceptrons (two hidden layers). Here we demonstrate that two-layer perceptron classifiers trained with back propagation can form both convex and disjoint decision regions. Such classifiers are robust, train rapidly, and provide good performance with simple decision regions. When complex decision regions are required, however, convergence time can be excessively long and performance is often no better than that of k-nearest neighbor classifiers. Three neural net classifiers are presented that provide more rapid training under such situations. Two use fixed weights in the first one or two layers and are similar to classifiers that estimate probability density functions using histograms. A third "feature map classifier" uses both unsupervised and supervised training. It provides good performance with little supervised training in situations such as speech recognition where much unlabeled training data is available. The architecture of this classifier can be used to implement a neural net k-nearest neighbor classifier.




Neural Net and Traditional Classifiers

Neural Information Processing Systems

Previous work on nets with continuous-valued inputs led to generative procedures to construct convex decision regions with two-layer perceptrons (one hidden layer) and arbitrary decision regions with three-layer perceptrons (two hidden layers). Here we demonstrate that two-layer perceptron classifiers trained with back propagation can form both convex and disjoint decision regions. Such classifiers are robust, train rapidly, and provide good performance with simple decision regions. When complex decision regions are required, however, convergence time can be excessively long and performance is often no better than that of k-nearest neighbor classifiers. Three neural net classifiers are presented that provide more rapid training under such situations. Two use fixed weights in the first one or two layers and are similar to classifiers that estimate probability density functions using histograms. A third "feature map classifier" uses both unsupervised and supervised training. It provides good performance with little supervised training in situations such as speech recognition where much unlabeled training data is available. The architecture of this classifier can be used to implement a neural net k-nearest neighbor classifier.