Goto

Collaborating Authors

 Recht, Ben


Transduction with Matrix Completion: Three Birds with One Stone

Neural Information Processing Systems

We pose transductive classification as a matrix completion problem. By assuming the underlying matrix has a low rank, our formulation is able to handle three problems simultaneously: i) multi-label learning, where each item has more than one label, ii) transduction, where most of these labels are unspecified, and iii) missing data, where a large number of features are missing. We obtained satisfactory results on several real-world tasks, suggesting that the low rank assumption may not be as restrictive as it seems. Our method allows for different loss functions to apply on the feature and label entries of the matrix. The resulting nuclear norm minimization problem is solved with a modified fixed-point continuation method that is guaranteed to find the global optimum.


Fast methods for denoising matrix completion formulations, with applications to robust seismic data interpolation

arXiv.org Machine Learning

Recent SVD-free matrix factorization formulations have enabled rank minimization for systems with millions of rows and columns, paving the way for matrix completion in extremely large-scale applications, such as seismic data interpolation. In this paper, we consider matrix completion formulations designed to hit a target data-fitting error level provided by the user, and propose an algorithm called LR-BPDN that is able to exploit factorized formulations to solve the corresponding optimization problem. Since practitioners typically have strong prior knowledge about target error level, this innovation makes it easy to apply the algorithm in practice, leaving only the factor rank to be determined. Within the established framework, we propose two extensions that are highly relevant to solving practical challenges of data interpolation. First, we propose a weighted extension that allows known subspace information to improve the results of matrix completion formulations. We show how this weighting can be used in the context of frequency continuation, an essential aspect to seismic data interpolation. Second, we propose matrix completion formulations that are robust to large measurement errors in the available data. We illustrate the advantages of LR-BPDN on the collaborative filtering problem using the MovieLens 1M, 10M, and Netflix 100M datasets. Then, we use the new method, along with its robust and subspace re-weighted extensions, to obtain high-quality reconstructions for large scale seismic interpolation problems with real data, even in the presence of data contamination.


Query Complexity of Derivative-Free Optimization

Neural Information Processing Systems

Derivative Free Optimization (DFO) is attractive when the objective function's derivatives are not available and evaluations are costly. Moreover, if the function evaluations are noisy, then approximating gradients by finite differences is difficult. This paper gives quantitative lower bounds on the performance of DFO with noisy function evaluations, exposing a fundamental and unavoidable gap between optimization performance based on noisy evaluations versus noisy gradients. This challenges the conventional wisdom that the method of finite differences is comparable to a stochastic gradient. However, there are situations in which DFO is unavoidable, and for such situations we propose a new DFO algorithm that is proved to be near optimal for the class of strongly convex objective functions. A distinctive feature of the algorithm is that it only uses Boolean-valued function comparisons, rather than evaluations. This makes the algorithm useful in an even wider range of applications, including optimization based on paired comparisons from human subjects, for example. Remarkably, we show that regardless of whether DFO is based on noisy function evaluations or Boolean-valued function comparisons, the convergence rate is the same.


Factoring nonnegative matrices with linear programs

Neural Information Processing Systems

This paper describes a new approach for computing nonnegative matrix factorizations (NMFs) with linear programming. The key idea is a data-driven model for the factorization, in which the most salient features in the data are used to express the remaining features. More precisely, given a data matrix X, the algorithm identifies a matrix C that satisfies X = CX and some linear constraints. The matrix C selects features, which are then used to compute a low-rank NMF of X. A theoretical analysis demonstrates that this approach has the same type of guarantees as the recent NMF algorithm of Arora et al.~(2012). In contrast with this earlier work, the proposed method has (1) better noise tolerance, (2) extends to more general noise models, and (3) leads to efficient, scalable algorithms. Experiments with synthetic and real datasets provide evidence that the new approach is also superior in practice. An optimized C++ implementation of the new algorithm can factor a multi-Gigabyte matrix in a matter of minutes.


Practical Large-Scale Optimization for Max-norm Regularization

Neural Information Processing Systems

The max-norm was proposed as a convex matrix regularizer by Srebro et al (2004) and was shown to be empirically superior to the trace-norm for collaborative filtering problems. Although the max-norm can be computed in polynomial time, there are currently no practical algorithms for solving large-scale optimization problems that incorporate the max-norm. The present work uses a factorization technique of Burer and Monteiro (2003) to devise scalable first-order algorithms for convex programs involving the max-norm. These algorithms are applied to solve huge collaborative filtering, graph cut, and clustering problems. Empirically, the new methods outperform mature techniques from all three areas.


Transduction with Matrix Completion: Three Birds with One Stone

Neural Information Processing Systems

We pose transductive classification as a matrix completion problem. By assuming the underlying matrix has a low rank, our formulation is able to handle three problems simultaneously: i) multi-label learning, where each item has more than one label, ii) transduction, where most of these labels are unspecified, and iii) missing data, where a large number of features are missing. We obtained satisfactory results on several real-world tasks, suggesting that the low rank assumption may not be as restrictive as it seems. Our method allows for different loss functions to apply on the feature and label entries of the matrix. The resulting nuclear norm minimization problem is solved with a modified fixed-point continuation method that is guaranteed to find the global optimum.


Unsupervised Regression with Applications to Nonlinear System Identification

Neural Information Processing Systems

We derive a cost functional for estimating the relationship between highdimensional observations and the low-dimensional process that generated them with no input-output examples. Limiting our search to invertible observation functions confers numerous benefits, including a compact representation and no suboptimal local minima. Our approximation algorithms for optimizing this cost functional are fast and give diagnostic bounds on the quality of their solution. Our method can be viewed as a manifold learning algorithm that utilizes a prior on the low-dimensional manifold coordinates. The benefits of taking advantage of such priors in manifold learning and searching for the inverse observation functions in system identification are demonstrated empirically by learning to track moving targets from raw measurements in a sensor network setting and in an RFID tracking experiment.