Cevher, Volkan
An Inexact Proximal Path-Following Algorithm for Constrained Convex Minimization
Dinh, Quoc Tran, Kyrillidis, Anastasios, Cevher, Volkan
Many scientific and engineering applications feature nonsmooth convex minimization problems over convex sets. In this paper, we address an important instance of this broad class where we assume that the nonsmooth objective is equipped with a tractable proximity operator and that the convex constraint set affords a self-concordant barrier. We provide a new joint treatment of proximal and self-concordant barrier concepts and illustrate that such problems can be efficiently solved, without the need of lifting the problem dimensions, as in disciplined convex optimization approach. We propose an inexact path-following algorithmic framework and theoretically characterize the worst-case analytical complexity of this framework when the proximal subproblems are solved inexactly. To show the merits of our framework, we apply its instances to both synthetic and real-world applications, where it shows advantages over standard interior point methods. As a by-product, we describe how our framework can obtain points on the Pareto frontier of regularized problems with self-concordant objectives in a tuning free fashion.
A variational approach to stable principal component pursuit
Aravkin, Aleksandr, Becker, Stephen, Cevher, Volkan, Olsen, Peder
We introduce a new convex formulation for stable principal component pursuit (SPCP) to decompose noisy signals into low-rank and sparse representations. For numerical solutions of our SPCP formulation, we first develop a convex variational framework and then accelerate it with quasi-Newton methods. We show, via synthetic and real data experiments, that our approach offers advantages over the classical SPCP formulations in scalability and practical parameter selection.
Scalable sparse covariance estimation via self-concordance
Kyrillidis, Anastasios, Mahabadi, Rabeeh Karimi, Tran-Dinh, Quoc, Cevher, Volkan
We consider the class of convex minimization problems, composed of a self-concordant function, such as the $\log\det$ metric, a convex data fidelity term $h(\cdot)$ and, a regularizing -- possibly non-smooth -- function $g(\cdot)$. This type of problems have recently attracted a great deal of interest, mainly due to their omnipresence in top-notch applications. Under this \emph{locally} Lipschitz continuous gradient setting, we analyze the convergence behavior of proximal Newton schemes with the added twist of a probable presence of inexact evaluations. We prove attractive convergence rate guarantees and enhance state-of-the-art optimization schemes to accommodate such developments. Experimental results on sparse covariance estimation show the merits of our algorithm, both in terms of recovery efficiency and complexity.
Composite Self-Concordant Minimization
Tran-Dinh, Quoc, Kyrillidis, Anastasios, Cevher, Volkan
We propose a variable metric framework for minimizing the sum of a self-concordant function and a possibly non-smooth convex function, endowed with an easily computable proximal operator. We theoretically establish the convergence of our framework without relying on the usual Lipschitz gradient assumption on the smooth part. An important highlight of our work is a new set of analytic step-size selection and correction procedures based on the structure of the problem. We describe concrete algorithmic instances of our framework for several interesting applications and demonstrate them numerically on both synthetic and real data.
High-Dimensional Gaussian Process Bandits
Djolonga, Josip, Krause, Andreas, Cevher, Volkan
Many applications in machine learning require optimizing unknown functions defined over a high-dimensional space from noisy samples that are expensive to obtain. We address this notoriously hard challenge, under the assumptions that the function varies only along some low-dimensional subspace and is smooth (i.e., it has a low norm in a Reproducible Kernel Hilbert Space). In particular, we present the SI-BO algorithm, which leverages recent low-rank matrix recovery techniques to learn the underlying subspace of the unknown function and applies Gaussian Process Upper Confidence sampling for optimization of the function. We carefully calibrate the exploration-exploitation tradeoff by allocating the sampling budget to subspace estimation and function optimization, and obtain the first subexponential cumulative regret bounds and convergence rates for Bayesian optimization in high-dimensions under noisy observations. Numerical results demonstrate the effectiveness of our approach in difficult scenarios.
A proximal Newton framework for composite minimization: Graph learning without Cholesky decompositions and matrix inversions
Dinh, Quoc Tran, Kyrillidis, Anastasios, Cevher, Volkan
We propose an algorithmic framework for convex minimization problems of a composite function with two terms: a self-concordant function and a possibly nonsmooth regularization term. Our method is a new proximal Newton algorithm that features a local quadratic convergence rate. As a specific instance of our framework, we consider the sparse inverse covariance matrix estimation in graph learning problems. Via a careful dual formulation and a novel analytic step-size selection procedure, our approach for graph learning avoids Cholesky decompositions and matrix inversions in its iteration making it attractive for parallel and distributed implementations.
Active Learning of Multi-Index Function Models
Hemant, Tyagi, Cevher, Volkan
We consider the problem of actively learning \textit{multi-index} functions of the form $f(\vecx) = g(\matA\vecx)= \sum_{i=1}^k g_i(\veca_i^T\vecx)$ from point evaluations of $f$. We assume that the function $f$ is defined on an $\ell_2$-ball in $\Real^d$, $g$ is twice continuously differentiable almost everywhere, and $\matA \in \mathbb{R}^{k \times d}$ is a rank $k$ matrix, where $k \ll d$. We propose a randomized, active sampling scheme for estimating such functions with uniform approximation guarantees. Our theoretical developments leverage recent techniques from low rank matrix recovery, which enables us to derive an estimator of the function $f$ along with sample complexity bounds. We also characterize the noise robustness of the scheme, and provide empirical evidence that the high-dimensional scaling of our sample complexity bounds are quite accurate.
Learning with Compressible Priors
Cevher, Volkan
We describe probability distributions, dubbed compressible priors, whose independent and identically distributed (iid) realizations result in compressible signals. A signal is compressible when sorted magnitudes of its coefficients exhibit a power-law decay so that the signal can be well-approximated by a sparse signal. Since compressible signals live close to sparse signals, their intrinsic information can be stably embedded via simple non-adaptive linear projections into a much lower dimensional space whose dimension grows logarithmically with the ambient signal dimension. By using order statistics, we show that N-sample iid realizations of generalized Pareto, Student’s t, log-normal, Frechet, and log-logistic distributions are compressible, i.e., they have a constant expected decay rate, which is independent of N. In contrast, we show that generalized Gaussian distribution with shape parameter q is compressible only in restricted cases since the expected decay rate of its N-sample iid realizations decreases with N as 1/[q log(N/q)]. We use compressible priors as a scaffold to build new iterative sparse signal recovery algorithms based on Bayesian inference arguments. We show how tuning of these algorithms explicitly depends on the parameters of the compressible prior of the signal, and how to learn the parameters of the signal’s compressible prior on the fly during recovery.
Sparse Signal Recovery Using Markov Random Fields
Cevher, Volkan, Duarte, Marco F., Hegde, Chinmay, Baraniuk, Richard
Compressive Sensing (CS) combines sampling and compression into a single sub-Nyquist linear measurement process for sparse and compressible signals. In this paper, we extend the theory of CS to include signals that are concisely represented in terms of a graphical model. In particular, we use Markov Random Fields (MRFs) to represent sparse signals whose nonzero coefficients are clustered. Our new model-based reconstruction algorithm, dubbed Lattice Matching Pursuit (LaMP), stably recovers MRF-modeled signals using many fewer measurements and computations than the current state-of-the-art algorithms.