Neural Information Processing Systems
Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian
The online incremental gradient (or backpropagation) algorithm is widely considered to be the fastest method for solving large-scale neural-network (NN) learning problems. In contrast, we show that an appropriately implemented iterative batch-mode (or block-mode) learning method can be much faster. For example, it is three times faster in the UCI letter classification problem (26 outputs, 16,000 data items, 6,066 parameters with a two-hidden-layer multilayer perceptron) and 353 times faster in a nonlinear regression problem arising in color recipe prediction (10 outputs, 1,000 data items, 2,210 parameters with a neuro-fuzzy modular network). The three principal innovative ingredients in our algorithm are the following: First, we use scaled trust-region regularization with inner-outer iteration tosolve the associated "overdetermined" nonlinear least squares problem, where the inner iteration performs a truncated (or inexact) Newton method. Second, we employ Pearlmutter's implicit sparse Hessian matrix-vector multiply algorithm to construct theKrylov subspaces used to solve for the truncated Newton update. Third, we exploit sparsity (for preconditioning) in the matrices resulting from the NNs having many outputs.
An Improved Scheme for Detection and Labelling in Johansson Displays
Fanti, Claudio, Polito, Marzia, Perona, Pietro
Consider a number of moving points, where each point is attached to a joint of the human body and projected onto an image plane. Johannson showed that humans can effortlessly detect and recognize thepresence of other humans from such displays. This is true even when some of the body points are missing (e.g. because of occlusion) and unrelated clutter points are added to the display. We are interested in replicating this ability in a machine. To this end, we present a labelling and detection scheme in a probabilistic framework. Our method is based on representing the joint probability densityof positions and velocities of body points with a graphical model, and using Loopy Belief Propagation to calculate a likely interpretation of the scene. Furthermore, we introduce a global variable representing the body's centroid. Experiments on one motion-captured sequence suggest that our scheme improves on the accuracy of a previous approach based on triangulated graphical models,especially when very few parts are visible. The improvement isdue both to the more general graph structure we use and, more significantly, to the introduction of the centroid variable.
Policy Search by Dynamic Programming
Bagnell, J. A., Kakade, Sham M., Schneider, Jeff G., Ng, Andrew Y.
We consider the policy search approach to reinforcement learning. We show that if a "baseline distribution" is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide nontrivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem.
Fast Feature Selection from Microarray Expression Data via Multiplicative Large Margin Algorithms
New feature selection algorithms for linear threshold functions are described whichcombine backward elimination with an adaptive regularization method. This makes them particularly suitable to the classification of microarray expression data, where the goal is to obtain accurate rules depending on few genes only. Our algorithms are fast and easy to implement, since they center on an incremental (large margin) algorithm which allows us to avoid linear, quadratic or higher-order programming methods. We report on preliminary experiments with five known DNA microarray datasets. These experiments suggest that multiplicative large margin algorithms tend to outperform additive algorithms (such as SVM) on feature selection tasks.
Training a Quantum Neural Network
Most proposals for quantum neural networks have skipped over the problem ofhow to train the networks. The mechanics of quantum computing are different enough from classical computing that the issue of training should be treated in detail. We propose a simple quantum neural network and a training method for it. It can be shown that this algorithm works in quantum systems. Results on several real-world data sets show that this algorithm can train the proposed quantum neural networks, and that it has some advantages over classical learning algorithms.
Locality Preserving Projections
Many problems in information processing involve some form of dimensionality reduction.In this paper, we introduce Locality Preserving Projections (LPP). These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data set. LPP should be seen as an alternative to Principal Component Analysis(PCA) - a classical linear technique that projects the data along the directions of maximal variance. When the high dimensional datalies on a low dimensional manifold embedded in the ambient space, the Locality Preserving Projections are obtained by finding the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operatoron the manifold.
AUC Optimization vs. Error Rate Minimization
Cortes, Corinna, Mohri, Mehryar
The area under an ROC curve (AUC) is a criterion used in many applications tomeasure the quality of a classification algorithm. However, the objective function optimized in most of these algorithms is the error rate and not the AUC value. We give a detailed statistical analysis of the relationship between the AUC and the error rate, including the first exact expression of the expected value and the variance of the AUC for a fixed error rate. Our results show that the average AUC is monotonically increasing asa function of the classification accuracy, but that the standard deviation for uneven distributions and higher error rates is noticeable. Thus, algorithms designed to minimize the error rate may not lead to the best possible AUC values. We show that, under certain conditions, the global function optimized by the RankBoost algorithm is exactly the AUC. We report the results of our experiments with RankBoost in several datasets demonstrating the benefits of an algorithm specifically designed to globally optimize the AUC over other existing algorithms optimizing an approximation of the AUC or only locally optimizing the AUC.
Learning Non-Rigid 3D Shape from 2D Motion
Torresani, Lorenzo, Hertzmann, Aaron, Bregler, Christoph
This paper presents an algorithm for learning the time-varying shape of a nonrigid 3D object from uncalibrated 2D tracking data. We model shape motion as a rigid component (rotation and translation) combined with a nonrigid deformation. Reconstruction is ill-posed if arbitrary deformations areallowed. We constrain the problem by assuming that the object shape at each time instant is drawn from a Gaussian distribution. Based on this assumption, the algorithm simultaneously estimates 3D shape and motion for each time frame, learns the parameters of the Gaussian, and robustly fills-in missing data points. We then extend the algorithm to model temporal smoothness in object shape, thus allowing it to handle severe cases of missing data.
Analytical Solution of Spike-timing Dependent Plasticity Based on Synaptic Biophysics
Porr, Bernd, Saudargiene, Ausra, Wörgötter, Florentin
Spike timing plasticity (STDP) is a special form of synaptic plasticity where the relative timing of post-and presynaptic activity determines the change of the synaptic weight. On the postsynaptic side, active backpropagating spikesin dendrites seem to play a crucial role in the induction of spike timing dependent plasticity. We argue that postsynaptically the temporal change of the membrane potential determines the weight change. Coming from the presynaptic side induction of STDP is closely related to the activation of NMDA channels. Therefore, we will calculate analytically the change of the synaptic weight by correlating the derivative ofthe membrane potential with the activity of the NMDA channel.