Goto

Collaborating Authors

 Aybat, Necdet Serhat


A primal-dual method for conic constrained distributed optimization problems

Neural Information Processing Systems

We consider cooperative multi-agent consensus optimization problems over an undirected network of agents, where only those agents connected by an edge can directly communicate. The objective is to minimize the sum of agent-specific composite convex functions over agent-specific private conic constraint sets; hence, the optimal consensus decision should lie in the intersection of these private sets. We provide convergence rates in sub-optimality, infeasibility and consensus violation; examine the effect of underlying network topology on the convergence rates of the proposed decentralized algorithms; and show how to extend these methods to handle time-varying communication networks. Papers published at the Neural Information Processing Systems Conference.


Robust Accelerated Gradient Methods for Smooth Strongly Convex Functions

arXiv.org Machine Learning

We study the trade-offs between convergence rate and robustness to gradient errors in designing a first-order algorithm. We focus on gradient descent (GD) and accelerated gradient (AG) methods for minimizing strongly convex functions when the gradient has random errors in the form of additive white noise. With gradient errors, the function values of the iterates need not converge to the optimal value; hence, we define the robustness of an algorithm to noise as the asymptotic expected suboptimality of the iterate sequence to input noise power. For this robustness measure, we provide exact expressions for the quadratic case using tools from robust control theory and tight upper bounds for the smooth strongly convex case using Lyapunov functions certified through matrix inequalities. We use these characterizations within an optimization problem which selects parameters of each algorithm to achieve a particular trade-off between rate and robustness. Our results show that AG can achieve acceleration while being more robust to random gradient errors. This behavior is quite different than previously reported in the deterministic gradient noise setting. We also establish some connections between the robustness of an algorithm and how quickly it can converge back to the optimal solution if it is perturbed from the optimal point with deterministic noise. Our framework also leads to practical algorithms that can perform better than other state-of-the-art methods in the presence of random gradient noise.


A Universally Optimal Multistage Accelerated Stochastic Gradient Method

arXiv.org Machine Learning

We study the problem of minimizing a strongly convex and smooth function when we have noisy estimates of its gradient. We propose a novel multistage accelerated algorithm that is universally optimal in the sense that it achieves the optimal rate both in the deterministic and stochastic case and operates without knowledge of noise characteristics. The algorithm consists of stages that use a stochastic version of Nesterov's accelerated algorithm with a specific restart and parameters selected to achieve the fastest reduction in the bias-variance terms in the convergence rate bounds.


Efficient Optimization Algorithms for Robust Principal Component Analysis and Its Variants

arXiv.org Machine Learning

Robust PCA has drawn significant attention in the last decade due to its success in numerous application domains, ranging from bio-informatics, statistics, and machine learning to image and video processing in computer vision. Robust PCA and its variants such as sparse PCA and stable PCA can be formulated as optimization problems with exploitable special structures. Many specialized efficient optimization methods have been proposed to solve robust PCA and related problems. In this paper we review existing optimization methods for solving convex and nonconvex relaxations/variants of robust PCA, discuss their advantages and disadvantages, and elaborate on their convergence behaviors. We also provide some insights for possible future research directions including new algorithmic frameworks that might be suitable for implementing on multi-processor setting to handle large-scale problems.


Generalized Sparse Precision Matrix Selection for Fitting Multivariate Gaussian Random Fields to Large Data Sets

arXiv.org Machine Learning

We present a new method for estimating multivariate, second-order stationary Gaussian Random Field (GRF) models based on the Sparse Precision matrix Selection (SPS) algorithm, proposed by Davanloo et al. (2015) for estimating scalar GRF models. Theoretical convergence rates for the estimated between-response covariance matrix and for the estimated parameters of the underlying spatial correlation function are established. Numerical tests using simulated and real datasets validate our theoretical findings. Data segmentation is used to handle large data sets.


A primal-dual method for conic constrained distributed optimization problems

Neural Information Processing Systems

We consider cooperative multi-agent consensus optimization problems over an undirected network of agents, where only those agents connected by an edge can directly communicate. The objective is to minimize the sum of agent-specific composite convex functions over agent-specific private conic constraint sets; hence, the optimal consensus decision should lie in the intersection of these private sets. We provide convergence rates in sub-optimality, infeasibility and consensus violation; examine the effect of underlying network topology on the convergence rates of the proposed decentralized algorithms; and show how to extend these methods to handle time-varying communication networks.


Sparse Precision Matrix Selection for Fitting Gaussian Random Field Models to Large Data Sets

arXiv.org Machine Learning

Iterative methods for fitting a Gaussian Random Field (GRF) model to spatial data via maximum likelihood (ML) require $\mathcal{O}(n^3)$ floating point operations per iteration, where $n$ denotes the number of data locations. For large data sets, the $\mathcal{O}(n^3)$ complexity per iteration together with the non-convexity of the ML problem render traditional ML methods inefficient for GRF fitting. The problem is even more aggravated for anisotropic GRFs where the number of covariance function parameters increases with the process domain dimension. In this paper, we propose a new two-step GRF estimation procedure when the process is second-order stationary. First, a \emph{convex} likelihood problem regularized with a weighted $\ell_1$-norm, utilizing the available distance information between observation locations, is solved to fit a sparse \emph{{precision} (inverse covariance) matrix to the observed data using the Alternating Direction Method of Multipliers. Second, the parameters of the GRF spatial covariance function are estimated by solving a least squares problem. Theoretical error bounds for the proposed estimator are provided; moreover, convergence of the estimator is shown as the number of samples per location increases. The proposed method is numerically compared with state-of-the-art methods for big $n$. Data segmentation schemes are implemented to handle large data sets.