Goto

Collaborating Authors

 Genre


Discussion: Latent variable graphical model selection via convex optimization

arXiv.org Machine Learning

It is my pleasure to congratulate the authors for an innovative and inspiring piece of work. Chandrasekaran, Parrilo and Willsky (hereafter CPW) have come up with a novel approach, combining ideas from convex optimization and algebraic geometry, to the longstanding problem of Gaussian graphical model selection with latent variables. Their method is intuitive and simple to implement, based on solving a convex log-determinant program with suitable choices of regularization. In addition, they establish a number of attractive theoretical guarantees that hold under highdimensional scaling, meaning that the graph size p and sample size n are allowed to grow simultaneously.



Discussion: Latent variable graphical model selection via convex optimization

arXiv.org Machine Learning

By Ming Yuan Georgia Institute of Technology I want to start by congratulating Professors Chandrasekaran, Parrilo and Willsky for this fine piece of work. Their paper, hereafter referred to as CPW, addresses one of the biggest practical challenges of Gaussian graphical models--how to make inferences for a graphical model in the presence of missing variables. The difficulty comes from the fact that the validity of conditional independence relationships implied by a graphical model relies critically on the assumption that all conditional variables are observed, which of course can be unrealistic. As CPW shows, this is not as hopeless as it might appear to be. They characterize conditions under which a conditional graphical model can be identified, and offer a penalized likelihood method to reconstruct it.


A Framework for Evaluating Approximation Methods for Gaussian Process Regression

arXiv.org Machine Learning

Gaussian process (GP) predictors are an important component of many Bayesian approaches to machine learning. However, even a straightforward implementation of Gaussian process regression (GPR) requires O(n^2) space and O(n^3) time for a dataset of n examples. Several approximation methods have been proposed, but there is a lack of understanding of the relative merits of the different approximations, and in what situations they are most useful. We recommend assessing the quality of the predictions obtained as a function of the compute time taken, and comparing to standard baselines (e.g., Subset of Data and FITC). We empirically investigate four different approximation algorithms on four different prediction problems, and make our code available to encourage future comparisons.


Soft (Gaussian CDE) regression models and loss functions

arXiv.org Machine Learning

Regression, unlike classification, has lacked a comprehensive and effective approach to deal with cost-sensitive problems by the reuse (and not a re-training) of general regression models. In this paper, a wide variety of cost-sensitive problems in regression (such as bids, asymmetric losses and rejection rules) can be solved effectively by a lightweight but powerful approach, consisting of: (1) the conversion of any traditional one-parameter crisp regression model into a two-parameter soft regression model, seen as a normal conditional density estimator, by the use of newly-introduced enrichment methods; and (2) the reframing of an enriched soft regression model to new contexts by an instance-dependent optimisation of the expected loss derived from the conditional normal distribution.


Efficient Point-to-Subspace Query in $\ell^1$: Theory and Applications in Computer Vision

arXiv.org Machine Learning

Motivated by vision tasks such as robust face and object recognition, we consider the following general problem: given a collection of low-dimensional linear subspaces in a high-dimensional ambient (image) space and a query point (image), efficiently determine the nearest subspace to the query in $\ell^1$ distance. We show in theory that Cauchy random embedding of the objects into significantly-lower-dimensional spaces helps preserve the identity of the nearest subspace with constant probability. This offers the possibility of efficiently selecting several candidates for accurate search. We sketch preliminary experiments on robust face and digit recognition to corroborate our theory.


POWERPLAY: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem

arXiv.org Artificial Intelligence

Most of computer science focuses on automatically solving given computational problems. I focus on automatically inventing or discovering problems in a way inspired by the playful behavior of animals and humans, to train a more and more general problem solver from scratch in an unsupervised fashion. Consider the infinite set of all computable descriptions of tasks with possibly computable solutions. The novel algorithmic framework POWERPLAY (2011) continually searches the space of possible pairs of new tasks and modifications of the current problem solver, until it finds a more powerful problem solver that provably solves all previously learned tasks plus the new one, while the unmodified predecessor does not. Wow-effects are achieved by continually making previously learned skills more efficient such that they require less time and space. New skills may (partially) re-use previously learned skills. POWERPLAY's search orders candidate pairs of tasks and solver modifications by their conditional computational (time & space) complexity, given the stored experience so far. The new task and its corresponding task-solving skill are those first found and validated. The computational costs of validating new tasks need not grow with task repertoire size. POWERPLAY's ongoing search for novelty keeps breaking the generalization abilities of its present solver. This is related to Goedel's sequence of increasingly powerful formal theories based on adding formerly unprovable statements to the axioms without affecting previously provable theorems. The continually increasing repertoire of problem solving procedures can be exploited by a parallel search for solutions to additional externally posed tasks. POWERPLAY may be viewed as a greedy but practical implementation of basic principles of creativity. A first experimental analysis can be found in separate papers [53,54].


Learning curves for multi-task Gaussian process regression

arXiv.org Machine Learning

We study the average case performance of multi-task Gaussian process (GP) regression as captured in the learning curve, i.e. the average Bayes error for a chosen task versus the total number of examples $n$ for all tasks. For GP covariances that are the product of an input-dependent covariance function and a free-form inter-task covariance matrix, we show that accurate approximations for the learning curve can be obtained for an arbitrary number of tasks $T$. We use these to study the asymptotic learning behaviour for large $n$. Surprisingly, multi-task learning can be asymptotically essentially useless, in the sense that examples from other tasks help only when the degree of inter-task correlation, $\rho$, is near its maximal value $\rho=1$. This effect is most extreme for learning of smooth target functions as described by e.g. squared exponential kernels. We also demonstrate that when learning many tasks, the learning curves separate into an initial phase, where the Bayes error on each task is reduced down to a plateau value by "collective learning" even though most tasks have not seen examples, and a final decay that occurs once the number of examples is proportional to the number of tasks.


Iterative Hard Thresholding Methods for $l_0$ Regularized Convex Cone Programming

arXiv.org Machine Learning

In this paper we consider $l_0$ regularized convex cone programming problems. In particular, we first propose an iterative hard thresholding (IHT) method and its variant for solving $l_0$ regularized box constrained convex programming. We show that the sequence generated by these methods converges to a local minimizer. Also, we establish the iteration complexity of the IHT method for finding an $\epsilon$-local-optimal solution. We then propose a method for solving $l_0$ regularized convex cone programming by applying the IHT method to its quadratic penalty relaxation and establish its iteration complexity for finding an $\epsilon$-approximate local minimizer. Finally, we propose a variant of this method in which the associated penalty parameter is dynamically updated, and show that every accumulation point is a local minimizer of the problem.


A Multiscale Framework for Challenging Discrete Optimization

arXiv.org Machine Learning

Current state-of-the-art discrete optimization methods struggle behind when it comes to challenging contrast-enhancing discrete energies (i.e., favoring different labels for neighboring variables). This work suggests a multiscale approach for these challenging problems. Deriving an algebraic representation allows us to coarsen any pair-wise energy using any interpolation in a principled algebraic manner. Furthermore, we propose an energy-aware interpolation operator that efficiently exposes the multiscale landscape of the energy yielding an effective coarse-to-fine optimization scheme. Results on challenging contrast-enhancing energies show significant improvement over state-of-the-art methods.