Goto

Collaborating Authors

 Technology


Agreement-Based Learning

Neural Information Processing Systems

The learning of probabilistic models with many hidden variables and nondecomposable dependenciesis an important and challenging problem. In contrast to traditional approaches based on approximate inference in a single intractable model, our approach is to train a set of tractable submodels by encouraging them to agree on the hidden variables. This allows us to capture non-decomposable aspects of the data while still maintaining tractability. We propose an objective function for our approach, derive EMstyle algorithms for parameter estimation, and demonstrate their effectiveness on three challenging real-world learning tasks.


McRank: Learning to Rank Using Multiple Classification and Gradient Boosting

Neural Information Processing Systems

We cast the ranking problem as (1) multiple classification ("Mc") (2) multiple ordinal classification,which lead to computationally tractable learning algorithms for relevance ranking in Web search. We consider the DCG criterion (discounted cumulative gain), a standard quality measure in information retrieval. Our approach ismotivated by the fact that perfect classifications result in perfect DCG scores and the DCG errors are bounded by classification errors. We propose using theExpected Relevance to convert class probabilities into ranking scores. The class probabilities are learned using a gradient boosting tree algorithm. Evaluations onlarge-scale datasets show that our approach can improve LambdaRank [5] and the regressions-based ranker [6], in terms of the (normalized) DCG scores. An efficient implementation of the boosting tree algorithm is also presented.


Theoretical Analysis of Learning with Reward-Modulated Spike-Timing-Dependent Plasticity

Neural Information Processing Systems

Reward-modulated spike-timing-dependent plasticity (STDP) has recently emerged as a candidate for a learning rule that could explain how local learning rules at single synapses support behaviorally relevant adaptive changes in complex networksof spiking neurons. However the potential and limitations of this learning rule could so far only be tested through computer simulations. This article providestools for an analytic treatment of reward-modulated STDP, which allow us to predict under which conditions reward-modulated STDP will be able to achieve a desired learning effect. In particular, we can produce in this way a theoretical explanation and a computer model for a fundamental experimental finding on biofeedback in monkeys (reported in [1]).


Sparse deep belief net model for visual area V2

Neural Information Processing Systems

Motivated in part by the hierarchical organization of cortex, a number of algorithms have recently been proposed that try to learn hierarchical, or ``deep,'' structure from unlabeled data. While several authors have formally or informally compared their algorithms to computations performed in visual area V1 (and the cochlea), little attempt has been made thus far to evaluate these algorithms in terms of their fidelity for mimicking computations at deeper levels in the cortical hierarchy. This paper presents an unsupervised learning model that faithfully mimics certain properties of visual area V2. Specifically, we develop a sparse variant of the deep belief networks of Hinton et al. (2006). We learn two layers of nodes in the network, and demonstrate that the first layer, similar to prior work on sparse coding and ICA, results in localized, oriented, edge filters, similar to the Gabor functions known to model V1 cell receptive fields. Further, the second layer in our model encodes correlations of the first layer responses in the data. Specifically, it picks up both collinear (``contour'') features as well as corners and junctions. More interestingly, in a quantitative comparison, the encoding of these more complex ``corner'' features matches well with the results from the Ito & Komatsu's study of biological V2 responses. This suggests that our sparse variant of deep belief networks holds promise for modeling more higher-order features.


Simulated Annealing: Rigorous finite-time guarantees for optimization on continuous domains

Neural Information Processing Systems

Simulated annealing is a popular method for approaching the solution of a global optimization problem. Existing results on its performance apply to discrete combinatorial optimizationwhere the optimization variables can assume only a finite set of possible values. We introduce a new general formulation of simulated annealing whichallows one to guarantee finite-time performance in the optimization of functions of continuous variables. The results hold universally for any optimization problem on a bounded domain and establish a connection between simulated annealing and up-to-date theory of convergence of Markov chain Monte Carlo methods on continuous domains. This work is inspired by the concept of finite-time learning with known accuracy and confidence developed in statistical learning theory.


Non-parametric Modeling of Partially Ranked Data

Neural Information Processing Systems

Statistical models on full and partial rankings of n items are often of limited practical usefor large n due to computational consideration. We explore the use of nonparametric models for partially ranked data and derive efficient procedures for their use for large n. The derivations are largely possible through combinatorial and algebraic manipulations based on the lattice of partial rankings. In particular, we demonstrate for the first time a nonparametric coherent and consistent model capable of efficiently aggregating partially ranked data of different types.


Topmoumoute Online Natural Gradient Algorithm

Neural Information Processing Systems

Guided by the goal of obtaining an optimization algorithm that is both fast and yielding good generalization, we study the descent direction maximizing the decrease in generalization error or the probability of not increasing generalization error. The surprising result is that from both the Bayesian and frequentist perspectives this can yield the natural gradient direction. Although that direction can be very expensive to compute we develop an efficient, general, online approximation to the natural gradient descent which is suited to large scale problems. We report experimental results showing much faster convergence in computation time and in number of iterations with TONGA (Topmoumoute Online natural Gradient Algorithm) than with stochastic gradient descent, even on very large datasets.


Learning the 2-D Topology of Images

Neural Information Processing Systems

We study the following question: is the two-dimensional structure of images a very strong prior or is it something that can be learned with a few examples of natural images? If someone gave us a learning task involving images for which the two-dimensional topology of pixels was not known, could we discover it automatically and exploit it? For example suppose that the pixels had been permuted in a fixed but unknown way, could we recover the relative two-dimensional location of pixels on images? The surprising result presented here is that not only the answer is yes but that about as few as a thousand images are enough to approximately recover the relative locations of about a thousand pixels. This is achieved using a manifold learning algorithm applied to pixels associated with a measure of distributional similarity between pixel intensities. We compare different topology-extraction approaches and show how having the two-dimensional topology can be exploited.