Optimization
Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization
Razaviyayn, Meisam, Hong, Mingyi, Luo, Zhi-Quan, Pang, Jong-Shi
Consider the problem of minimizing the sum of a smooth (possibly non-convex) and a convex (possibly nonsmooth) function involving a large number of variables. A popular approach to solve this problem is the block coordinate descent (BCD) method whereby at each iteration only one variable block is updated while the remaining variables are held fixed. With the recent advances in the developments of the multi-core parallel processing technology, it is desirable to parallelize the BCD method by allowing multiple blocks to be updated simultaneously at each iteration of the algorithm. In this work, we propose an inexact parallel BCD approach where at each iteration, a subset of the variables is updated in parallel by minimizing convex approximations of the original objective function. We investigate the convergence of this parallel BCD method for both randomized and cyclic variable selection rules. We analyze the asymptotic and non-asymptotic convergence behavior of the algorithm for both convex and non-convex objective functions. The numerical experiments suggest that for a special case of Lasso minimization problem, the cyclic block selection rule can outperform the randomized rule.
Learning Mixtures of Submodular Functions for Image Collection Summarization
Tschiatschek, Sebastian, Iyer, Rishabh K., Wei, Haochen, Bilmes, Jeff A.
We address the problem of image collection summarization by learning mixtures of submodular functions. We argue that submodularity is very natural to this problem, and we show that a number of previously used scoring functions are submodular โ a property not explicitly mentioned in these publications. We provide classes of submodular functions capturing the necessary properties of summaries, namely coverage, likelihood, and diversity. To learn mixtures of these submodular functions as scoring functions, we formulate summarization as a supervised learning problem using large-margin structured prediction. Furthermore, we introduce a novel evaluation metric, which we call V-ROUGE, for automatic summary scoring. While a similar metric called ROUGE has been successfully applied to document summarization [14], no such metric was known for quantifying the quality of image collection summaries. We provide a new dataset consisting of 14 real-world image collections along with many human-generated ground truth summaries collected using mechanical turk. We also extensively compare our method with previously explored methods for this problem and show that our learning approach outperforms all competitors on this new dataset. This paper provides, to our knowledge, the first systematic approach for quantifying the problem of image collection summarization, along with a new dataset of image collections and human summaries.
Efficient Structured Matrix Rank Minimization
Yu, Adams Wei, Ma, Wanli, Yu, Yaoliang, Carbonell, Jaime, Sra, Suvrit
We study the problem of finding structured low-rank matrices using nuclear norm regularization where the structure is encoded by a linear map. In contrast to most known approaches for linearly structured rank minimization, we do not (a) use the full SVD; nor (b) resort to augmented Lagrangian techniques; nor (c) solve linear systems per iteration. Instead, we formulate the problem differently so that it is amenable to a generalized conditional gradient method, which results in a practical improvement with low per iteration computational cost. Numerical results show that our approach significantly outperforms state-of-the-art competitors in terms of running time, while effectively recovering low rank solutions in stochastic system realization and spectral compressed sensing problems.
Submodular Attribute Selection for Action Recognition in Video
Zheng, Jingjing, Jiang, Zhuolin, Chellappa, Rama, Phillips, Jonathon P.
In real-world action recognition problems, low-level features cannot adequately characterize the rich spatial-temporal structures in action videos. In this work, we encode actions based on attributes that describes actions as high-level concepts: \textit{e.g.}, jump forward and motion in the air. We base our analysis on two types of action attributes. One type of action attributes is generated by humans. The second type is data-driven attributes, which is learned from data using dictionary learning methods. Attribute-based representation may exhibit high variance due to noisy and redundant attributes. We propose a discriminative and compact attribute-based representation by selecting a subset of discriminative attributes from a large attribute set. Three attribute selection criteria are proposed and formulated as a submodular optimization problem. A greedy optimization algorithm is presented and guaranteed to be at least (1-1/e)-approximation to the optimum. Experimental results on the Olympic Sports and UCF101 datasets demonstrate that the proposed attribute-based representation can significantly boost the performance of action recognition algorithms and outperform most recently proposed recognition approaches.
Efficient Inference of Continuous Markov Random Fields with Polynomial Potentials
Wang, Shenlong, Schwing, Alex, Urtasun, Raquel
In this paper, we prove that every multivariate polynomial with even degree can be decomposed into a sum of convex and concave polynomials. Motivated by this property, we exploit the concave-convex procedure to perform inference on continuous Markov random fields with polynomial potentials. In particular, we show that the concave-convex decomposition of polynomials can be expressed as a sum-of-squares optimization, which can be efficiently solved via semidefinite programming. We demonstrate the effectiveness of our approach in the context of 3D reconstruction, shape from shading and image denoising, and show that our approach significantly outperforms existing approaches in terms of efficiency as well as the quality of the retrieved solution.
Predictive Entropy Search for Efficient Global Optimization of Black-box Functions
Hernรกndez-Lobato, Josรฉ Miguel, Hoffman, Matthew W., Ghahramani, Zoubin
We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES). At each iteration, PES selects the next evaluation point that maximizes the expected information gained with respect to the global maximum. PES codifies this intractable acquisition function in terms of the expected reduction in the differential entropy of the predictive distribution. This reformulation allows PES to obtain approximations that are both more accurate and efficient than other alternatives such as Entropy Search (ES). Furthermore, PES can easily perform a fully Bayesian treatment of the model hyperparameters while ES cannot. We evaluate PES in both synthetic and real-world applications, including optimization problems in machine learning, finance, biotechnology, and robotics. We show that the increased accuracy of PES leads to significant gains in optimization performance.
Stochastic Network Design in Bidirected Trees
wu, xiaojian, Sheldon, Daniel R., Zilberstein, Shlomo
We investigate the problem of stochastic network design in bidirected trees. In this problem, an underlying phenomenon (e.g., a behavior, rumor, or disease) starts at multiple sources in a tree and spreads in both directions along its edges. Actions can be taken to increase the probability of propagation on edges, and the goal is to maximize the total amount of spread away from all sources. Our main result is a rounded dynamic programming approach that leads to a fully polynomial-time approximation scheme (FPTAS), that is, an algorithm that can find (1 ษ)-optimal solutions for any problem instance in time polynomial in the input size and 1/ษ. Our algorithm outperforms competing approaches on a motivating problem from computational sustainability to remove barriers in river networks to restore the health of aquatic ecosystems.
Tight Bounds for Influence in Diffusion Networks and Application to Bond Percolation and Epidemiology
Lemonnier, Remi, Scaman, Kevin, Vayatis, Nicolas
In this paper, we derive theoretical bounds for the long-term influence of a node in an Independent Cascade Model (ICM). We relate these bounds to the spectral radius of a particular matrix and show that the behavior is sub-critical when this spectral radius is lower than 1. More specifically, we point out that, in general networks, the sub-critical regime behaves in O(sqrt(n)) where n is the size of the network, and that this upper bound is met for star-shaped networks. We apply our results to epidemiology and percolation on arbitrary networks, and derive a bound for the critical value beyond which a giant connected component arises. Finally, we show empirically the tightness of our bounds for a large family of networks.
Projective dictionary pair learning for pattern classification
Gu, Shuhang, Zhang, Lei, Zuo, Wangmeng, Feng, Xiangchu
Discriminative dictionary learning (DL) has been widely studied in various pattern classification problems. Most of the existing DL methods aim to learn a synthesis dictionary to represent the input signal while enforcing the representation coefficients and/or representation residual to be discriminative. However, the $\ell_0$ or $\ell_1$-norm sparsity constraint on the representation coefficients adopted in many DL methods makes the training and testing phases time consuming. We propose a new discriminative DL framework, namely projective dictionary pair learning (DPL), which learns a synthesis dictionary and an analysis dictionary jointly to achieve the goal of signal representation and discrimination. Compared with conventional DL methods, the proposed DPL method can not only greatly reduce the time complexity in the training and testing phases, but also lead to very competitive accuracies in a variety of visual classification tasks.
Constrained convex minimization via model-based excessive gap
Tran-Dinh, Quoc, Cevher, Volkan
We introduce a model-based excessive gap technique to analyze first-order primal- dual methods for constrained convex minimization. As a result, we construct first- order primal-dual methods with optimal convergence rates on the primal objec- tive residual and the primal feasibility gap of their iterates separately. Through a dual smoothing and prox-center selection strategy, our framework subsumes the augmented Lagrangian, alternating direction, and dual fast-gradient methods as special cases, where our rates apply.