Country
Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds
Haro, Gloria, Randall, Gregory, Sapiro, Guillermo
The study of point cloud data sampled from a stratification, a collection of manifolds with possible different dimensions, is pursued in this paper. We present a technique for simultaneously soft clustering and estimating the mixed dimensionality and density of such structures. The framework is based on a maximum likelihood estimation of a Poisson mixture model. The presentation of the approach is completed with artificial and real examples demonstrating the importance of extending manifold learning to stratification learning.
Graph-Based Visual Saliency
Harel, Jonathan, Koch, Christof, Perona, Pietro
A new bottom-up visual saliency model, Graph-Based Visual Saliency (GBVS), is proposed. It consists of two steps: rst forming activation maps on certain feature channels, and then normalizing them in a way which highlights conspicuity and admits combination with other maps. The model is simple, and biologically plausible insofar as it is naturally parallelized. This model powerfully predicts human xations on 749 variations of 108 natural images, achieving 98% of the ROC area of a human-based control, whereas the classical algorithms of Itti & Koch ([2], [3], [4]) achieve only 84%.
Training Conditional Random Fields for Maximum Labelwise Accuracy
Gross, Samuel S., Russakovsky, Olga, Do, Chuong B., Batzoglou, Serafim
We consider the problem of training a conditional random field (CRF) to maximize per-label predictive accuracy on a training set, an approach motivated by the principle of empirical risk minimization. We give a gradient-based procedure for minimizing an arbitrarily accurate approximation of the empirical risk under a Hamming loss function. In experiments with both simulated and real data, our optimization procedure gives significantly better testing performance than several current approaches for CRF training, especially in situations of high label noise.
A Kernel Method for the Two-Sample-Problem
Gretton, Arthur, Borgwardt, Karsten, Rasch, Malte, Schölkopf, Bernhard, Smola, Alex J.
We propose two statistical tests to determine if two samples are from different distributions. Our test statistic is in both cases the distance between the means of the two samples mapped into a reproducing kernel Hilbert space (RKHS). The first test is based on a large deviation bound for the test statistic, while the second is based on the asymptotic distribution of this statistic.
Approximate Correspondences in High Dimensions
Grauman, Kristen, Darrell, Trevor
Pyramid intersection is an efficient method for computing an approximate partial matching between two sets of feature vectors. We introduce a novel pyramid embedding based on a hierarchy of non-uniformly shaped bins that takes advantage of the underlying structure of the feature space and remains accurate even for sets with high-dimensional feature vectors. The matching similarity is computed in linear time and forms a Mercer kernel. Whereas previous matching approximation algorithms suffer from distortion factors that increase linearly with the feature dimension, we demonstrate that our approach can maintain constant accuracy even as the feature dimension increases. When used as a kernel in a discriminative classifier, our approach achieves improved object recognition results over a state-of-the-art set kernel.
Large Margin Multi-channel Analog-to-Digital Conversion with Applications to Neural Prosthesis
Gore, Amit, Chakrabartty, Shantanu
A key challenge in designing analog-to-digital converters for cortically implanted prosthesis is to sense and process high-dimensional neural signals recorded by the micro-electrode arrays. In this paper, we describe a novel architecture for analog-to-digital (A/D) conversion that combines Σ conversion with spatial de-correlation within a single module. The architecture called multiple-input multiple-output (MIMO) Σ is based on a min-max gradient descent optimization of a regularized linear cost function that naturally lends to an A/D formulation. Using an online formulation, the architecture can adapt to slow variations in cross-channel correlations, observed due to relative motion of the microelectrodes with respect to the signal sources. Experimental results with real recorded multi-channel neural data demonstrate the effectiveness of the proposed algorithm in alleviating cross-channel redundancy across electrodes and performing data-compression directly at the A/D converter.
Approximate inference using planar graph decomposition
Globerson, Amir, Jaakkola, Tommi S.
A number of exact and approximate methods are available for inference calculations in graphical models. Many recent approximate methods for graphs with cycles are based on tractable algorithms for tree structured graphs. Here we base the approximation on a different tractable model, planar graphs with binary variables and pure interaction potentials (no external field). The partition function for such models can be calculated exactly using an algorithm introduced by Fisher and Kasteleyn in the 1960s. We show how such tractable planar models can be used in a decomposition to derive upper bounds on the partition function of non-planar models. The resulting algorithm also allows for the estimation of marginals. We compare our planar decomposition to the tree decomposition method of Wainwright et.
Data Integration for Classification Problems Employing Gaussian Process Priors
Girolami, Mark, Zhong, Mingjun
By adopting Gaussian process priors a fully Bayesian solution to the problem of integrating possibly heterogeneous data sets within a classification setting is presented. Approximate inference schemes employing Variational & Expectation Propagation based methods are developed and rigorously assessed. We demonstrate our approach to integrating multiple data sets on a large scale protein fold prediction problem where we infer the optimal combinations of covariance functions and achieve state-of-the-art performance without resorting to any ad hoc parameter tuning and classifier combination.
Bayesian Policy Gradient Algorithms
Ghavamzadeh, Mohammad, Engel, Yaakov
Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Conventional policy gradient methods use Monte-Carlo techniques to estimate this gradient. Since Monte Carlo methods tend to have high variance, a large number of samples is required, resulting in slow convergence. In this paper, we propose a Bayesian framework that models the policy gradient as a Gaussian process. This reduces the number of samples needed to obtain accurate gradient estimates. Moreover, estimates of the natural gradient as well as a measure of the uncertainty in the gradient estimates are provided at little extra cost.
iLSTD: Eligibility Traces and Convergence Analysis
Geramifard, Alborz, Bowling, Michael, Zinkevich, Martin, Sutton, Richard S.
In this paper, we generalize the previous iLSTD algorithm and present three new results: (1) the first convergence proof for an iLSTD algorithm; (2) an extension to incorporate eligibility traces without changing the asymptotic computational complexity; and (3) the first empirical results with an iLSTD algorithm for a problem (mountain car) with feature vectors large enough (n 10, 000) to show substantial computational advantages over LSTD.