Goto

Collaborating Authors

 Statistical Learning


The Curse of Highly Variable Functions for Local Kernel Machines

Neural Information Processing Systems

We present a series of theoretical arguments supporting the claim that a large class of modern learning algorithms that rely solely on the smoothness prior - with similarity between examples expressed with a local kernel - are sensitive to the curse of dimensionality, or more precisely to the variability of the target. Our discussion covers supervised, semisupervised and unsupervised learning algorithms. These algorithms are found to be local in the sense that crucial properties of the learned function at x depend mostly on the neighbors of x in the training set. This makes them sensitive to the curse of dimensionality, well studied for classical nonparametric statistical learning. We show in the case of the Gaussian kernel that when the function to be learned has many variations, these algorithms require a number of training examples proportional to the number of variations, which could be large even though there may exist short descriptions of the target function, i.e. their Kolmogorov complexity may be low. This suggests that there exist non-local learning algorithms that at least have the potential to learn about such structured but apparently complex functions (because locally they have many variations), while not using very specific prior domain knowledge.



Large-scale biophysical parameter estimation in single neurons via constrained linear regression

Neural Information Processing Systems

Our understanding of the input-output function of single cells has been substantially advanced by biophysically accurate multi-compartmental models. The large number of parameters needing hand tuning in these models has, however, somewhat hampered their applicability and interpretability. Here we propose a simple and well-founded method for automatic estimation of many of these key parameters: 1) the spatial distribution of channel densities on the cell's membrane; 2) the spatiotemporal pattern of synaptic input; 3) the channels' reversal potentials; 4) the intercompartmental conductances; and 5) the noise level in each compartment. We assume experimental access to: a) the spatiotemporal voltage signal in the dendrite (or some contiguous subpart thereof, e.g.


Kernelized Infomax Clustering

Neural Information Processing Systems

We propose a simple information-theoretic approach to soft clustering based on maximizing the mutual information I(x, y) between the unknown cluster labels y and the training patterns x with respect to parameters of specifically constrained encoding distributions. The constraints are chosen such that patterns are likely to be clustered similarly if they lie close to specific unknown vectors in the feature space. The method may be conveniently applied to learning the optimal affinity matrix, which corresponds to learning parameters of the kernelized encoder. The procedure does not require computations of eigenvalues of the Gram matrices, which makes it potentially attractive for clustering large data sets.



A Domain Decomposition Method for Fast Manifold Learning

Neural Information Processing Systems

We propose a fast manifold learning algorithm based on the methodology ofdomain decomposition. Starting with the set of sample points partitioned into two subdomains, we develop the solution of the interface problemthat can glue the embeddings on the two subdomains into an embedding on the whole domain. We provide a detailed analysis to assess the errors produced by the gluing process using matrix perturbation theory.Numerical examples are given to illustrate the efficiency and effectiveness of the proposed methods.



Conditional Visual Tracking in Kernel Space

Neural Information Processing Systems

We present a conditional temporal probabilistic framework for reconstructing 3Dhuman motion in monocular video based on descriptors encoding image silhouette observations. For computational efficiency we restrict visual inference to low-dimensional kernel induced nonlinear state spaces. Our methodology (kBME) combines kernel PCA-based nonlinear dimensionality reduction (kPCA) and Conditional Bayesian Mixture of Experts (BME) in order to learn complex multivalued predictors betweenobservations and model hidden states. This is necessary for accurate, inverse, visual perception inferences, where several probable, distant3D solutions exist due to noise or the uncertainty of monocular perspectiveprojection. Low-dimensional models are appropriate because many visual processes exhibit strong nonlinear correlations in both the image observations and the target, hidden state variables. The learned predictors are temporally combined within a conditional graphical modelin order to allow a principled propagation of uncertainty. We study several predictors and empirically show that the proposed algorithm positivelycompares with techniques based on regression, Kernel Dependency Estimation (KDE) or PCA alone, and gives results competitive tothose of high-dimensional mixture predictors at a fraction of their computational cost. We show that the method successfully reconstructs the complex 3D motion of humans in real monocular video sequences.


From Lasso regression to Feature vector machine

Neural Information Processing Systems

Lasso regression tends to assign zero weights to most irrelevant or redundant features, and hence is a promising technique for feature selection. Its limitation, however, is that it only offers solutions to linear models.


Metric Learning by Collapsing Classes

Neural Information Processing Systems

We present an algorithm for learning a quadratic Gaussian metric (Mahalanobis distance)for use in classification tasks. Our method relies on the simple geometric intuition that a good metric is one under which points in the same class are simultaneously near each other and far from points in the other classes. We construct a convex optimization problem whose solution generates such a metric by trying to collapse all examples in the same class to a single point and push examples in other classes infinitely far away. We show that when the metric we learn is used in simple classifiers, ityields substantial improvements over standard alternatives on a variety of problems. We also discuss how the learned metric may be used to obtain a compact low dimensional feature representation of the original input space, allowing more efficient classification with very little reduction in performance.