Goto

Collaborating Authors

 Carreira-Perpiñán, Miguel Á.


The K-modes algorithm for clustering

arXiv.org Machine Learning

Many clustering algorithms exist that estimate a cluster centroid, such as K-means, K-medoids or mean-shift, but no algorithm seems to exist that clusters data by returning exactly K meaningful modes. We propose a natural definition of a K-modes objective function by combining the notions of density and cluster assignment. The algorithm becomes K-means and K-medoids in the limit of very large and very small scales. Computationally, it is slightly slower than K-means but much faster than mean-shift or K-medoids. Unlike K-means, it is able to find centroids that are valid patterns, truly representative of a cluster, even with nonconvex clusters, and appears robust to outliers and misspecification of the scale and number of clusters.


Distributed optimization of deeply nested systems

arXiv.org Machine Learning

In science and engineering, intelligent processing of complex signals such as images, sound or language is often performed by a parameterized hierarchy of nonlinear processing layers, sometimes biologically inspired. Hierarchical systems (or, more generally, nested systems) offer a way to generate complex mappings using simple stages. Each layer performs a different operation and achieves an ever more sophisticated representation of the input, as, for example, in an deep artificial neural network, an object recognition cascade in computer vision or a speech front-end processing. Joint estimation of the parameters of all the layers and selection of an optimal architecture is widely considered to be a difficult numerical nonconvex optimization problem, difficult to parallelize for execution in a distributed computation environment, and requiring significant human expert effort, which leads to suboptimal systems in practice. We describe a general mathematical strategy to learn the parameters and, to some extent, the architecture of nested systems, called the method of auxiliary coordinates (MAC). This replaces the original problem involving a deeply nested function with a constrained problem involving a different function in an augmented space without nesting. The constrained problem may be solved with penalty-based methods using alternating optimization over the parameters and the auxiliary coordinates. MAC has provable convergence, is easy to implement reusing existing algorithms for single layers, can be parallelized trivially and massively, applies even when parameter derivatives are not available or not desirable, and is competitive with state-of-the-art nonlinear optimizers even in the serial computation setting, often providing reasonable models within a few iterations.


A Denoising View of Matrix Completion

Neural Information Processing Systems

In matrix completion, we are given a matrix where the values of only some of the entries are present, and we want to reconstruct the missing ones. Much work has focused on the assumption that the data matrix has low rank. We propose a more general assumption based on denoising, so that we expect that the value of a missing entry can be predicted from the values of neighboring points. We propose a nonparametric version of denoising based on local, iterated averaging with mean-shift, possibly constrained to preserve local low-rank manifold structure. The few user parameters required (the denoising scale, number of neighbors and local dimensionality) and the number of iterations can be estimated by cross-validating the reconstruction error. Using our algorithms as a postprocessing step on an initial reconstruction (provided by e.g. a low-rank method), we show consistent improvements with synthetic, image and motion-capture data.


Reconstruction of sequential data with density models

arXiv.org Machine Learning

We introduce the problem of reconstructing a sequence of multidimensional real vectors where some of the data are missing. This problem contains regression and mapping inversion as particular cases where the pattern of missing data is independent of the sequence index. The problem is hard because it involves possibly multivalued mappings at each vector in the sequence, where the missing variables can take more than one value given the present variables; and the set of missing variables can vary from one vector to the next. To solve this problem, we propose an algorithm based on two redundancy assumptions: vector redundancy (the data live in a low-dimensional manifold), so that the present variables constrain the missing ones; and sequence redundancy (e.g. continuity), so that consecutive vectors constrain each other. We capture the low-dimensional nature of the data in a probabilistic way with a joint density model, here the generative topographic mapping, which results in a Gaussian mixture. Candidate reconstructions at each vector are obtained as all the modes of the conditional distribution of missing variables given present variables. The reconstructed sequence is obtained by minimising a global constraint, here the sequence length, by dynamic programming. We present experimental results for a toy problem and for inverse kinematics of a robot arm.


Generalised elastic nets

arXiv.org Machine Learning

The elastic net was introduced as a heuristic algorithm for combinatorial optimisation and has been applied, among other problems, to biological modelling. It has an energy function which trades off a fitness term against a tension term. In the original formulation of the algorithm the tension term was implicitly based on a first-order derivative. In this paper we generalise the elastic net model to an arbitrary quadratic tension term, e.g. derived from a discretised differential operator, and give an efficient learning algorithm. We refer to these as generalised elastic nets (GENs). We give a theoretical analysis of the tension term for 1D nets with periodic boundary conditions, and show that the model is sensitive to the choice of finite difference scheme that represents the discretised derivative. We illustrate some of these issues in the context of cortical map models, by relating the choice of tension term to a cortical interaction function. In particular, we prove that this interaction takes the form of a Mexican hat for the original elastic net, and of progressively more oscillatory Mexican hats for higher-order derivatives. The results apply not only to generalised elastic nets but also to other methods using discrete differential penalties, and are expected to be useful in other areas, such as data analysis, computer graphics and optimisation problems.


People Tracking with the Laplacian Eigenmaps Latent Variable Model

Neural Information Processing Systems

Reliably recovering 3D human pose from monocular video requires models that bias the estimates towards typical human poses and motions. We construct priors forpeople tracking using the Laplacian Eigenmaps Latent Variable Model (LELVM). LELVM is a recently introduced probabilistic dimensionality reduction modelthat combines the advantages of latent variable models--a multimodal probability density for latent and observed variables, and globally differentiable nonlinear mappings for reconstruction and dimensionality reduction--with those of spectral manifold learning methods--no local optima, ability to unfold highly nonlinear manifolds, and good practical scaling to latent spaces of high dimension. LELVMis computationally efficient, simple to learn from sparse training data, and compatible with standard probabilistic trackers such as particle filters. We analyze the performance of a LELVM-based probabilistic sigma point mixture tracker in several real and synthetic human motion sequences and demonstrate that LELVM not only provides sufficient constraints for robust operation in the presence ofmissing, noisy and ambiguous image measurements, but also compares favorably with alternative trackers based on PCA or GPLVM priors. Recent research in reconstructing articulated human motion has focused on methods that can exploit available prior knowledge on typical human poses or motions in an attempt to build more reliable algorithms. The high-dimensionality of human ambient pose space--between 30-60 joint angles or joint positions depending on the desired accuracy level, makes exhaustive search prohibitively expensive. This has negative impact on existing trackers, which are often not sufficiently reliable at reconstructing humanlike poses, self-initializing or recovering from failure. Such difficulties have stimulated research in algorithms and models that reduce the effective working space, either using genericsearch focusing methods (annealing, state space decomposition, covariance scaling) or by exploiting specific problem structure (e.g.


Non-rigid point set registration: Coherent Point Drift

Neural Information Processing Systems

We introduce Coherent Point Drift (CPD), a novel probabilistic method for nonrigid registration of point sets. The registration is treated as a Maximum Likelihood (ML) estimation problem with motion coherence constraint over the velocity field such that one point set moves coherently to align with the second set. We formulate the motion coherence constraint and derive a solution of regularized ML estimation through the variational approach, which leads to an elegant kernel form. We also derive the EM algorithm for the penalized ML optimization with deterministic annealing. The CPD method simultaneously finds both the nonrigid transformation and the correspondence between two point sets without making any prior assumption of the transformation model except that of motion coherence. This method can estimate complex nonlinear nonrigid transformations, and is shown to be accurate on 2D and 3D examples and robust in the presence of outliers and missing points.


Non-rigid point set registration: Coherent Point Drift

Neural Information Processing Systems

We introduce Coherent Point Drift (CPD), a novel probabilistic method for nonrigid registrationof point sets. The registration is treated as a Maximum Likelihood (ML)estimation problem with motion coherence constraint over the velocity field such that one point set moves coherently to align with the second set. We formulate the motion coherence constraint and derive a solution of regularized ML estimation through the variational approach, which leads to an elegant kernel form. We also derive the EM algorithm for the penalized ML optimization with deterministic annealing. The CPD method simultaneously finds both the nonrigid transformation and the correspondence between two point sets without making any prior assumption of the transformation model except that of motion coherence. Thismethod can estimate complex nonlinear nonrigid transformations, and is shown to be accurate on 2D and 3D examples and robust in the presence of outliers and missing points.


Proximity Graphs for Clustering and Manifold Learning

Neural Information Processing Systems

Many machine learning algorithms for clustering or dimensionality reduction takeas input a cloud of points in Euclidean space, and construct a graph with the input data points as vertices. This graph is then partitioned (clustering)or used to redefine metric information (dimensionality reduction). There has been much recent work on new methods for graph-based clustering and dimensionality reduction, but not much on constructing the graph itself. Graphs typically used include the fullyconnected graph,a local fixed-grid graph (for image segmentation) or a nearest-neighbor graph. We suggest that the graph should adapt locally to the structure of the data. This can be achieved by a graph ensemble that combines multiple minimum spanning trees, each fit to a perturbed version of the data set. We show that such a graph ensemble usually produces abetter representation of the data manifold than standard methods; and that it provides robustness to a subsequent clustering or dimensionality reductionalgorithm based on the graph.


Reconstruction of Sequential Data with Probabilistic Models and Continuity Constraints

Neural Information Processing Systems

We consider the problem of reconstructing a temporal discrete sequence of multidimensional real vectors when part of the data is missing, under the assumption that the sequence was generated by a continuous process. A particular case of this problem is multivariate regression, which is very difficult when the underlying mapping is one-to-many. We propose an algorithm based on a joint probability model of the variables of interest, implemented using a nonlinear latent variable model. Each point in the sequence is potentially reconstructed as any of the modes of the conditional distribution of the missing variables given the present variables (computed using an exhaustive mode search in a Gaussian mixture). Mode selection is determined by a dynamic programming search that minimises a geometric measure of the reconstructed sequence, derived from continuity constraints. We illustrate the algorithm with a toy example and apply it to a real-world inverse problem, the acoustic-toarticulatory mapping. The results show that the algorithm outperforms conditional mean imputation and multilayer perceptrons. 1 Definition of the problem