Country
Mapping a Manifold of Perceptual Observations
Nonlinear dimensionality reduction is formulated here as the problem of trying to find a Euclidean feature-space embedding of a set of observations that preserves as closely as possible their intrinsic metric structure - the distances between points on the observation manifold as measured along geodesic paths. Our isometric feature mapping procedure, or isomap, is able to reliably recover low-dimensional nonlinear structure in realistic perceptual data sets, such as a manifold of face images, where conventional global mapping methods find only local minima. The recovered map provides a canonical set of globally meaningful features, which allows perceptual transformations such as interpolation, extrapolation, and analogy - highly nonlinear transformations in the original observation space - to be computed with simple linear operations in feature space.
Bidirectional Retrieval from Associative Memory
Sommer, Friedrich T., Palm, Günther
Similarity based fault tolerant retrieval in neural associative memories (NAM) has not lead to wiedespread applications. A drawback of the efficient Willshaw model for sparse patterns [Ste61, WBLH69], is that the high asymptotic information capacity is of little practical use because of high cross talk noise arising in the retrieval for finite sizes. Here a new bidirectional iterative retrieval method for the Willshaw model is presented, called crosswise bidirectional (CB)retrieval, providing enhanced performance. We discuss its asymptotic capacity limit, analyze the first step, and compare itin experiments with the Willshaw model. Applying the very efficient CB memory model either in information retrieval systems or as a functional model for reciprocal cortico-cortical pathways requires more than robustness against random noise in the input: Our experiments show also the segmentation ability of CB-retrieval with addresses containing the superposition of pattens, provided even at high memory load. 1 INTRODUCTION From a technical point of view neural associative memories (NAM) provide data storage and retrieval.
Stacked Density Estimation
Smyth, Padhraic, Wolpert, David
The component gj's are usually relatively simple unimodal densities such as Gaussians. Density estimation with mixtures involves finding the locations, shapes, and weights of the component densities from the data (using for example the Expectation-Maximization (EM) procedure). Kernel density estimation canbe viewed as a special case of mixture modeling where a component is centered at each data point, given a weight of 1/N, and a common covariance structure (kernel shape) is estimated from the data. The quality of a particular probabilistic model can be evaluated by an appropriate scoring rule on independent out-of-sample data, such as the test set log-likelihood (also referred to as the log-scoring rule in the Bayesian literature).
Monotonic Networks
Monotonicity is a constraint which arises in many application domains. Wepresent a machine learning model, the monotonic network, for which monotonicity can be enforced exactly, i.e., by virtue offunctional form. A straightforward method for implementing and training a monotonic network is described. Monotonic networks are proven to be universal approximators of continuous, differentiable monotonicfunctions. We apply monotonic networks to a real-world task in corporate bond rating prediction and compare them to other approaches. 1 Introduction Several recent papers in machine learning have emphasized the importance of priors anddomain-specific knowledge. In their well-known presentation of the biasvariance tradeoff(Geman and Bienenstock, 1992)' Geman and Bienenstock conclude by arguing that the crucial issue in learning is the determination of the "right biases" whichconstrain the model in the appropriate way given the task at hand.
Learning Continuous Attractors in Recurrent Networks
One approach to invariant object recognition employs a recurrent neural networkas an associative memory. In the standard depiction of the network's state space, memories of objects are stored as attractive fixed points of the dynamics. I argue for a modification of this picture: if an object has a continuous family of instantiations, it should be represented by a continuous attractor. This idea is illustrated with a network that learns to complete patterns. To perform the task of filling in missing information, thenetwork develops a continuous attractor that models the manifold from which the patterns are drawn.
Training Methods for Adaptive Boosting of Neural Networks
Schwenk, Holger, Bengio, Yoshua
"Boosting" is a general method for improving the performance of any learning algorithm that consistently generates classifiers which need to perform only slightly better than random guessing. A recently proposed and very promising boosting algorithm is AdaBoost [5]. It has been applied withgreat success to several benchmark machine learning problems using rather simple learning algorithms [4], and decision trees [1, 2, 6]. In this paper we use AdaBoost to improve the performances of neural networks. We compare training methods based on sampling the training set and weighting the cost function. Our system achieves about 1.4% error on a data base of online handwritten digits from more than 200 writers. Adaptive boosting of a multi-layer network achieved 1.5% error on the UCI Letters and 8.1 % error on the UCI satellite data set.
RCC Cannot Compute Certain FSA, Even with Arbitrary Transfer Functions
The proof given here shows that for any finite, discrete transfer function used by the units of an RCC network, there are finite-state automata (FSA) that the network cannot model, no matter how many units are used. The proof also applies to continuous transfer functions with a finite number of fixed-points, such as sigmoid and radial-basis functions.
An Incremental Nearest Neighbor Algorithm with Queries
We consider the general problem of learning multi-category classification fromlabeled examples. We present experimental results for a nearest neighbor algorithm which actively selects samples from different pattern classes according to a querying rule instead of the a priori class probabilities. The amount of improvement of this query-based approach over the passive batch approach depends on the complexity of the Bayes rule. The principle on which this algorithm isbased is general enough to be used in any learning algorithm which permits a model-selection criterion and for which the error rate of the classifier is calculable in terms of the complexity of the model. 1 INTRODUCTION We consider the general problem of learning multi-category classification from labeled examples.In many practical learning settings the time or sample size available for training are limited. This may have adverse effects on the accuracy of the resulting classifier.For instance, in learning to recognize handwritten characters typical time limitation confines the training sample size to be of the order of a few hundred examples. It is important to make learning more efficient by obtaining only training data which contains significant information about the separability of the pattern classes thereby letting the learning algorithm participate actively in the sampling process. Querying for the class labels of specificly selected examples in the input space may lead to significant improvements in the generalization error (cf.