Not enough data to create a plot.
Try a different view from the menu above.
Recognizing Handwritten Digits Using Mixtures of Linear Models
Hinton, Geoffrey E., Revow, Michael, Dayan, Peter
We construct a mixture of locally linear generative models of a collection of pixel-based images of digits, and use them for recognition. Different models of a given digit are used to capture different styles of writing, and new images are classified by evaluating their log-likelihoods under each model. We use an EMbased algorithm in which the M-step is computationally straightforward principal components analysis (PCA). Incorporating tangent-plane information [12] about expected local deformations only requires adding tangent vectors into the sample covariance matrices for the PCA, and it demonstrably improves performance.
On-line Learning of Dichotomies
Barkai, N., Seung, H. S., Sompolinsky, H.
The performance of online algorithms for learning dichotomies is studied. In online learning, the number of examples P is equivalent to the learning time, since each example is presented only once. The learning curve, or generalization error as a function of P, depends on the schedule at which the learning rate is lowered.
Limits on Learning Machine Accuracy Imposed by Data Quality
Cortes, Corinna, Jackel, L. D., Chiang, Wan-Ping
Random errors and insufficiencies in databases limit the performance of any classifier trained from and applied to the database. In this paper we propose a method to estimate the limiting performance of classifiers imposed by the database. We demonstrate this technique on the task of predicting failure in telecommunication paths. 1 Introduction Data collection for a classification or regression task is prone to random errors, e.g.
Bayesian Query Construction for Neural Network Models
Paass, Gerhard, Kindermann, Jรถrg
If data collection is costly, there is much to be gained by actively selecting particularly informative data points in a sequential way. In a Bayesian decision-theoretic framework we develop a query selection criterion which explicitly takes into account the intended use of the model predictions. By Markov Chain Monte Carlo methods the necessary quantities can be approximated to a desired precision. As the number of data points grows, the model complexity is modified by a Bayesian model selection strategy. The properties of two versions of the criterion ate demonstrated in numerical experiments.
A Non-linear Information Maximisation Algorithm that Performs Blind Separation
Bell, Anthony J., Sejnowski, Terrence J.
With the exception of (Becker 1992), there has been little attempt to use non-linearity in networks to achieve something a linear network could not. Nonlinear networks, however, are capable of computing more general statistics than those second-order ones involved in decorrelation, and as a consequence they are capable of dealing with signals (and noises) which have detailed higher-order structure. The success of the'H-J' networks at blind separation (Jutten & Herault 1991) suggests that it should be possible to separate statistically independent components, by using learning rules which make use of moments of all orders. This paper takes a principled approach to this problem, by starting with the question of how to maximise the information passed on in nonlinear feed-forward network. Starting with an analysis of a single unit, the approach is extended to a network mapping N inputs to N outputs. In the process, it will be shown that, under certain fairly weak conditions, the N ---. N network forms a minimally redundant encoding ofthe inputs, and that it therefore performs Independent Component Analysis (ICA). 2 Information maximisation The information that output Y contains about input X is defined as: I(Y, X) H(Y) - H(YIX) (1) where H(Y) is the entropy (information) in the output, while H(YIX) is whatever information the output has which didn't come from the input. In the case that we have no noise (or rather, we don't know what is noise and what is signal in the input), the mapping between X and Y is deterministic and H(YIX) has its lowest possible value of
Learning direction in global motion: two classes of psychophysically-motivated models
Sundareswaran, V., Vaina, Lucia M.
Perceptual learning is defined as fast improvement in performance and retention of the learned ability over a period of time. In a set of psychophysical experiments we demonstrated that perceptual learning occurs for the discrimination of direction in stochastic motion stimuli. Here we model this learning using two approaches: a clustering model that learns to accommodate the motion noise, and an averaging model that learns to ignore the noise. Simulations of the models show performance similar to the psychophysical results. 1 Introduction Global motion perception is critical to many visual tasks: to perceive self-motion, to identify objects in motion, to determine the structure of the environment, and to make judgements for safe navigation. In the presence of noise, as in random dot kinematograms, efficient extraction of global motion involves considerable spatial integration. Newsome and Colleagues (1989) showed that neurons in the macaque middle temporal area (MT) are motion direction-selective, and perform global integration of motion in their large receptive fields. Psychophysical studies in humans have characterized the limits of spatial and temporal integration in motion (Watamaniuk et.
Recurrent Networks: Second Order Properties and Pruning
Pedersen, Morten With, Hansen, Lars Kai
Second order properties of cost functions for recurrent networks are investigated. We analyze a layered fully recurrent architecture, the virtue of this architecture is that it features the conventional feedforward architecture as a special case. A detailed description of recursive computation of the full Hessian of the network cost function is provided. We discuss the possibility of invoking simplifying approximations of the Hessian and show how weight decays iron the cost function and thereby greatly assist training. We present tentative pruning results, using Hassibi et al.'s Optimal Brain Surgeon, demonstrating that recurrent networks can construct an efficient internal memory. 1 LEARNING IN RECURRENT NETWORKS Time series processing is an important application area for neural networks and numerous architectures have been suggested, see e.g. (Weigend and Gershenfeld, 94). The most general structure is a fully recurrent network and it may be adapted using Real Time Recurrent Learning (RTRL) suggested by (Williams and Zipser, 89). By invoking a recurrent network, the length of the network memory can be adapted to the given time series, while it is fixed for the conventional lag-space net (Weigend et al., 90). In forecasting, however, feedforward architectures remain the most popular structures; only few applications are reported based on the Williams&Zipser approach.
Unsupervised Classification of 3D Objects from 2D Views
Suzuki, Satoshi, Ando, Hiroshi
The human visual system can recognize various 3D (three-dimensional) objects from their 2D (two-dimensional) retinal images although the images vary significantly as the viewpoint changes. Recent computational models have explored how to learn to recognize 3D objects from their projected views (Poggio & Edelman, 1990). Most existing models are, however, based on supervised learning, i.e., during training the teacher tells which object each view belongs to. The model proposed by Weinshall et al. (1990) also requires a signal that segregates different objects during training. This paper, on the other hand, discusses unsupervised aspects of 3D object recognition where the system discovers categories by itself.