Optimization
A statistical model for tensor PCA
Montanari, Andrea, Richard, Emile
We consider the Principal Component Analysis problem for large tensors of arbitrary order $k$ under a single-spike (or rank-one plus noise) model. On the one hand, we use information theory, and recent results in probability theory, to establish necessary and sufficient conditions under which the principal component can be estimated using unbounded computational resources. It turns out that this is possible as soon as the signal-to-noise ratio $\beta$ becomes larger than $C\sqrt{k\log k}$ (and in particular $\beta$ can remain bounded as the problem dimensions increase). On the other hand, we analyze several polynomial-time estimation algorithms, based on tensor unfolding, power iteration and message passing ideas from graphical models. We show that, unless the signal-to-noise ratio diverges in the system dimensions, none of these approaches succeeds. This is possibly related to a fundamental limitation of computationally tractable estimators for this problem. We discuss various initializations for tensor power iteration, and show that a tractable initialization based on the spectrum of the matricized tensor outperforms significantly baseline methods, statistically and computationally. Finally, we consider the case in which additional side information is available about the unknown signal. We characterize the amount of side information that allows the iterative algorithms to converge to a good estimate.
TRACCS: A Framework for Trajectory-Aware Coordinated Urban Crowd-Sourcing
Chen, Cen (Singapore Management University) | Cheng, Shih-Fen (Singapore Management University) | Gunawan, Aldy (Singapore Management University) | Misra, Archan (Singapore Management University) | Dasgupta, Koustuv (Xerox Research Centre India) | Chander, Deepthi (Xerox Research Centre India)
We investigate the problem of large-scale mobile crowd-tasking, where a large pool of citizen crowd-workers are used to perform a variety of location-specific urban logistics tasks. Current approaches to such mobile crowd-tasking are very decentralized: a crowd-tasking platform usually provides each worker a set of available tasks close to the worker's current location; each worker then independently chooses which tasks she wants to accept and perform. In contrast, we propose TRACCS, a more coordinated task assignment approach, where the crowd-tasking platform assigns a sequence of tasks to each worker, taking into account their expected location trajectory over a wider time horizon, as opposed to just instantaneous location. We formulate such task assignment as an optimization problem, that seeks to maximize the total payoff from all assigned tasks, subject to a maximum bound on the detour (from the expected path) that a worker will experience to complete her assigned tasks. We develop credible computationally-efficient heuristics to address this optimization problem (whose exact solution requires solving a complex integer linear program), and show, via simulations with realistic topologies and commuting patterns, that a specific heuristic (called Greedy-ILS) increases the fraction of assigned tasks by more than 20%, and reduces the average detour overhead by more than 60%, compared to the current decentralized approach.
Partition-wise Linear Models
Oiwa, Hidekazu, Fujimaki, Ryohei
Region-specific linear models are widely used in practical applications because of their non-linear but highly interpretable model representations. One of the key challenges in their use is non-convexity in simultaneous optimization of regions and region-specific models. This paper proposes novel convex region-specific linear models, which we refer to as partition-wise linear models. Our key ideas are 1) assigning linear models not to regions but to partitions (region-specifiers) and representing region-specific linear models by linear combinations of partition-specific models, and 2) optimizing regions via partition selection from a large number of given partition candidates by means of convex structured regularizations. In addition to providing initialization-free globally-optimal solutions, our convex formulation makes it possible to derive a generalization bound and to use such advanced optimization techniques as proximal methods and decomposition of the proximal maps for sparsity-inducing regularizations. Experimental results demonstrate that our partition-wise linear models perform better than or are at least competitive with state-of-the-art region-specific or locally linear models.
Robust sketching for multiple square-root LASSO problems
Pham, Vu, Ghaoui, Laurent El, Fernandez, Arturo
In many practical applications, learning tasks arise not in isolation, but as multiple instances of similar problems. A typical instance is when the same problem has to be solved, but with many different values of a regularization parameter. Cross-validation also involves a set of learning problems where the different "design matrices" are very close to each other, all being a low-rank perturbation of the same data matrix. Other examples of such multiple instances arise in sparse inverse covariance estimation with the LASSO (Friedman et al. (2008)), or in robust subspace clustering (Soltanolkotabi et al. (2014)). In such applications, it makes sense to spend processing time on the common part of the problems, in order to compress it in certain way, and speed up the overall computation. In this paper we propose an approach to multiple-instance square root LASSO based on "robust sketching", where the data matrix of an optimization problem is approximated by a sketch, that is, a simpler matrix that preserves some property of interest, and on which computations can be performed much faster 1
Screening Rules for Overlapping Group Lasso
Recently, to solve large-scale lasso and group lasso problems, screening rules have been developed, the goal of which is to reduce the problem size by efficiently discarding zero coefficients using simple rules independently of the others. However, screening for overlapping group lasso remains an open challenge because the overlaps between groups make it infeasible to test each group independently. In this paper, we develop screening rules for overlapping group lasso. To address the challenge arising from groups with overlaps, we take into account overlapping groups only if they are inclusive of the group being tested, and then we derive screening rules, adopting the dual polytope projection approach. This strategy allows us to screen each group independently of each other. In our experiments, we demonstrate the efficiency of our screening rules on various datasets.
A General Stochastic Algorithmic Framework for Minimizing Expensive Black Box Objective Functions Based on Surrogate Models and Sensitivity Analysis
Wang, Yilun, Shoemaker, Christine A.
We are focusing on bound constrained global optimization problems, whose objective functions are computationally expensive black-box functions and have multiple local minima. The recently popular Metric Stochastic Response Surface (MSRS) algorithm proposed by \cite{Regis2007SRBF} based on adaptive or sequential learning based on response surfaces is revisited and further extended for better performance in case of higher dimensional problems. Specifically, we propose a new way to generate the candidate points which the next function evaluation point is picked from according to the metric criteria, based on a new definition of distance, and prove the global convergence of the corresponding. Correspondingly, a more adaptive implementation of MSRS, named "SO-SA", is presented. "SO-SA" is is more likely to perturb those most sensitive coordinates when generating the candidate points, instead of perturbing all coordinates simultaneously. Numerical experiments on both synthetic problems and real problems demonstrate the advantages of our new algorithm, compared with many state of the art alternatives.}
Low-Rank Modeling and Its Applications in Image Analysis
Zhou, Xiaowei, Yang, Can, Zhao, Hongyu, Yu, Weichuan
Low-rank modeling generally refers to a class of methods that solve problems by representing variables of interest as low-rank matrices. It has achieved great success in various fields including computer vision, data mining, signal processing and bioinformatics. Recently, much progress has been made in theories, algorithms and applications of low-rank modeling, such as exact low-rank matrix recovery via convex programming and matrix completion applied to collaborative filtering. These advances have brought more and more attentions to this topic. In this paper, we review the recent advance of low-rank modeling, the state-of-the-art algorithms, and related applications in image analysis. We first give an overview to the concept of low-rank modeling and challenging problems in this area. Then, we summarize the models and algorithms for low-rank matrix recovery and illustrate their advantages and limitations with numerical experiments. Next, we introduce a few applications of low-rank modeling in the context of image analysis. Finally, we conclude this paper with some discussions.
Variational Reformulation of Bayesian Inverse Problems
Tsilifis, Panagiotis, Bilionis, Ilias, Katsounaros, Ioannis, Zabaras, Nicholas
The classical approach to inverse problems is based on the optimization of a misfit function. Despite its computational appeal, such an approach suffers from many shortcomings, e.g., non-uniqueness of solutions, modeling prior knowledge, etc. The Bayesian formalism to inverse problems avoids most of the difficulties encountered by the optimization approach, albeit at an increased computational cost. In this work, we use information theoretic arguments to cast the Bayesian inference problem in terms of an optimization problem. The resulting scheme combines the theoretical soundness of fully Bayesian inference with the computational efficiency of a simple optimization.
Generalized Conditional Gradient for Sparse Estimation
Yu, Yaoliang, Zhang, Xinhua, Schuurmans, Dale
Structured sparsity is an important modeling tool that expands the applicability of convex formulations for data analysis, however it also creates significant challenges for efficient algorithm design. In this paper we investigate the generalized conditional gradient (GCG) algorithm for solving structured sparse optimization problems---demonstrating that, with some enhancements, it can provide a more efficient alternative to current state of the art approaches. After providing a comprehensive overview of the convergence properties of GCG, we develop efficient methods for evaluating polar operators, a subroutine that is required in each GCG iteration. In particular, we show how the polar operator can be efficiently evaluated in two important scenarios: dictionary learning and structured sparse estimation. A further improvement is achieved by interleaving GCG with fixed-rank local subspace optimization. A series of experiments on matrix completion, multi-class classification, multi-view dictionary learning and overlapping group lasso shows that the proposed method can significantly reduce the training cost of current alternatives.
Convex Optimization in Julia
Udell, Madeleine, Mohan, Karanveer, Zeng, David, Hong, Jenny, Diamond, Steven, Boyd, Stephen
This paper describes Convex, a convex optimization modeling framework in Julia. Convex translates problems from a user-friendly functional language into an abstract syntax tree describing the problem. This concise representation of the global structure of the problem allows Convex to infer whether the problem complies with the rules of disciplined convex programming (DCP), and to pass the problem to a suitable solver. These operations are carried out in Julia using multiple dispatch, which dramatically reduces the time required to verify DCP compliance and to parse a problem into conic form. Convex then automatically chooses an appropriate backend solver to solve the conic form problem.