Goto

Collaborating Authors

 Education


megaman: Manifold Learning with Millions of points

arXiv.org Machine Learning

Manifold Learning (ML) is a class of algorithms seeking a low-dimensional nonlinear representation of high-dimensional data. Thus ML algorithms are, at least in theory, most applicable to high-dimensional data and sample sizes to enable accurate estimation of the manifold. Despite this, most existing manifold learning implementations are not particularly scalable. Here we present a Python package that implements a variety of manifold learning algorithms in a modular and scalable fashion, using fast approximate neighbors searches and fast sparse eigendecompositions. The package incorporates theoretical advances in manifold learning, such as the unbiased Laplacian estimator introduced by Coifman and Lafon (2006) and the estimation of the embedding distortion by the Riemannian metric method introduced by Perraul-Joncas and Meila (2013). In benchmarks, even on a single-core desktop computer, our code embeds millions of data points in minutes, and takes just 200 minutes to embed the main sample of galaxy spectra from the Sloan Digital Sky Survey -- consisting of 0.6 million samples in 3750-dimensions -- a task which has not previously been possible.


Discriminative models for robust image classification

arXiv.org Machine Learning

A variety of real-world tasks involve the classification of images into pre-determined categories. Designing image classification algorithms that exhibit robustness to acquisition noise and image distortions, particularly when the available training data are insufficient to learn accurate models, is a significant challenge. This dissertation explores the development of discriminative models for robust image classification that exploit underlying signal structure, via probabilistic graphical models and sparse signal representations. Probabilistic graphical models are widely used in many applications to approximate high-dimensional data in a reduced complexity set-up. Learning graphical structures to approximate probability distributions is an area of active research. Recent work has focused on learning graphs in a discriminative manner with the goal of minimizing classification error. In the first part of the dissertation, we develop a discriminative learning framework that exploits the complementary yet correlated information offered by multiple representations (or projections) of a given signal/image. Specifically, we propose a discriminative tree-based scheme for feature fusion by explicitly learning the conditional correlations among such multiple projections in an iterative manner. Experiments reveal the robustness of the resulting graphical model classifier to training insufficiency.


Distributed Multi-Task Learning with Shared Representation

arXiv.org Machine Learning

We study the problem of distributed multi-task learning with shared representation, where each machine aims to learn a separate, but related, task in an unknown shared low-dimensional subspaces, i.e. when the predictor matrix has low rank. We consider a setting where each task is handled by a different machine, with samples for the task available locally on the machine, and study communication-efficient methods for exploiting the shared structure.


A knowledge representation meta-model for rule-based modelling of signalling networks

arXiv.org Artificial Intelligence

The study of cellular signalling pathways and their deregulation in disease states, such as cancer, is a large and extremely complex task. Indeed, these systems involve many parts and processes but are studied piecewise and their literatures and data are consequently fragmented, distributed and sometimes--at least apparently--inconsistent. This makes it extremely difficult to build significant explanatory models with the result that effects in these systems that are brought about by many interacting factors are poorly understood. The rule-based approach to modelling has shown some promise for the representation of the highly combinatorial systems typically found in signalling where many of the proteins are composed of multiple binding domains, capable of simultaneous interactions, and/or peptide motifs controlled by post-translational modifications. However, the rule-based approach requires highly detailed information about the precise conditions for each and every interaction which is rarely available from any one single source. Rather, these conditions must be painstakingly inferred and curated, by hand, from information contained in many papers--each of which contains only part of the story. In this paper, we introduce a graph-based meta-model, attuned to the representation of cellular signalling networks, which aims to ease this massive cognitive burden on the rule-based curation process. This meta-model is a generalization of that used by Kappa and BNGL which allows for the flexible representation of knowledge at various levels of granularity. In particular, it allows us to deal with information which has either too little, or too much, detail with respect to the strict rule-based meta-model. Our approach provides a basis for the gradual aggregation of fragmented biological knowledge extracted from the literature into an instance of the meta-model from which we can define an automated translation into executable Kappa programs.


Network Unfolding Map by Edge Dynamics Modeling

arXiv.org Artificial Intelligence

The emergence of collective dynamics in neural networks is a mechanism of the animal and human brain for information processing. In this paper, we develop a computational technique of distributed processing elements, which are called particles. We observe the collective dynamics of particles in a complex network for transductive inference on semi-supervised learning problems. Three actions govern the particles' dynamics: walking, absorption, and generation. Labeled vertices generate new particles that compete against rival particles for edge domination. Active particles randomly walk in the network until they are absorbed by either a rival vertex or an edge currently dominated by rival particles. The result from the model simulation consists of sets of edges sorted by the label dominance. Each set tends to form a connected subnetwork to represent a data class. Although the intrinsic dynamics of the model is a stochastic one, we prove there exists a deterministic version with largely reduced computational complexity; specifically, with subquadratic growth. Furthermore, the edge domination process corresponds to an unfolding map. Intuitively, edges "stretch" and "shrink" according to edge dynamics. Consequently, such effect summarizes the relevant relationships between vertices and uncovered data classes. The proposed model captures important details of connectivity patterns over the edge dynamics evolution, which contrasts with previous approaches focused on vertex dynamics. Computer simulations reveal that our model can identify nonlinear features in both real and artificial data, including boundaries between distinct classes and the overlapping structure of data.


A Roadmap towards Machine Intelligence

arXiv.org Artificial Intelligence

A machine capable of performing complex tasks without requiring laborious programming would be tremendously useful in almost any human endeavor, from performing menial jobs for us to helping the advancement of basic and applied research. Given the current availability of powerful hardware and large amounts of machine-readable data, as well as the widespread interest in sophisticated machine learning methods, the times should be ripe for the development of intelligent machines. Still, since "solving AI" seems too complex a task to be pursued all at once, in the last decades the computational community has preferred to focus on solving relatively narrow empirical problems that are important for specific applications, but do not address the overarching goal of developing general-purpose intelligent machines. In this article, we propose an alternative approach: we first define the general characteristics we think intelligent machines should possess, and then we present a concrete roadmap to develop them in realistic, small steps, that are however incrementally structured in such a way that, jointly, they should lead us close to the ultimate goal of implementing a powerful AI. The article is organized as follows.


Local entropy as a measure for sampling solutions in Constraint Satisfaction Problems

arXiv.org Machine Learning

We introduce a novel Entropy-driven Monte Carlo (EdMC) strategy to efficiently sample solutions of random Constraint Satisfaction Problems (CSPs). First, we extend a recent result that, using a large-deviation analysis, shows that the geometry of the space of solutions of the Binary Perceptron Learning Problem (a prototypical CSP), contains regions of very high-density of solutions. Despite being sub-dominant, these regions can be found by optimizing a local entropy measure. Building on these results, we construct a fast solver that relies exclusively on a local entropy estimate, and can be applied to general CSPs. We describe its performance not only for the Perceptron Learning Problem but also for the random $K$-Satisfiabilty Problem (another prototypical CSP with a radically different structure), and show numerically that a simple zero-temperature Metropolis search in the smooth local entropy landscape can reach sub-dominant clusters of optimal solutions in a small number of steps, while standard Simulated Annealing either requires extremely long cooling procedures or just fails. We also discuss how the EdMC can heuristically be made even more efficient for the cases we studied.


Unifying distillation and privileged information

arXiv.org Machine Learning

Distillation (Hinton et al., 2015) and privileged information (Vapnik & Izmailov, 2015) are two techniques that enable machines to learn from other machines. This paper unifies these two techniques into generalized distillation, a framework to learn from multiple machines and data representations. We provide theoretical and causal insight about the inner workings of generalized distillation, extend it to unsupervised, semisupervised and multitask learning scenarios, and illustrate its efficacy on a variety of numerical simulations on both synthetic and real-world data.


Mismatch in the Classification of Linear Subspaces: Sufficient Conditions for Reliable Classification

arXiv.org Machine Learning

This paper considers the classification of linear subspaces with mismatched classifiers. In particular, we assume a model where one observes signals in the presence of isotropic Gaussian noise and the distribution of the signals conditioned on a given class is Gaussian with a zero mean and a low-rank covariance matrix. We also assume that the classifier knows only a mismatched version of the parameters of input distribution in lieu of the true parameters. By constructing an asymptotic low-noise expansion of an upper bound to the error probability of such a mismatched classifier, we provide sufficient conditions for reliable classification in the low-noise regime that are able to sharply predict the absence of a classification error floor. Such conditions are a function of the geometry of the true signal distribution, the geometry of the mismatched signal distributions as well as the interplay between such geometries, namely, the principal angles and the overlap between the true and the mismatched signal subspaces. Numerical results demonstrate that our conditions for reliable classification can sharply predict the behavior of a mismatched classifier both with synthetic data and in a motion segmentation and a hand-written digit classification applications.


Online Low-Rank Subspace Learning from Incomplete Data: A Bayesian View

arXiv.org Machine Learning

Extracting the underlying low-dimensional space where high-dimensional signals often reside has long been at the center of numerous algorithms in the signal processing and machine learning literature during the past few decades. At the same time, working with incomplete (partly observed) large scale datasets has recently been commonplace for diverse reasons. This so called {\it big data era} we are currently living calls for devising online subspace learning algorithms that can suitably handle incomplete data. Their envisaged objective is to {\it recursively} estimate the unknown subspace by processing streaming data sequentially, thus reducing computational complexity, while obviating the need for storing the whole dataset in memory. In this paper, an online variational Bayes subspace learning algorithm from partial observations is presented. To account for the unawareness of the true rank of the subspace, commonly met in practice, low-rankness is explicitly imposed on the sought subspace data matrix by exploiting sparse Bayesian learning principles. Moreover, sparsity, {\it simultaneously} to low-rankness, is favored on the subspace matrix by the sophisticated hierarchical Bayesian scheme that is adopted. In doing so, the proposed algorithm becomes adept in dealing with applications whereby the underlying subspace may be also sparse, as, e.g., in sparse dictionary learning problems. As shown, the new subspace tracking scheme outperforms its state-of-the-art counterparts in terms of estimation accuracy, in a variety of experiments conducted on simulated and real data.