Optimization
Pattern Decomposition with Complex Combinatorial Constraints: Application to Materials Discovery
Ermon, Stefano (Stanford University) | Bras, Ronan Le (Cornell University) | Suram, Santosh K. (California Institute of Technology) | Gregoire, John M. (California Institute of Technology) | Gomes, Carla P. (Cornell University) | Selman, Bart (Cornell University) | Dover, Robert B. van (Cornell University)
Identifying important components or factors in large amounts of noisy data is a key problem in machine learning and data mining. Motivated by a pattern decomposition problem in materials discovery, aimed at discovering new materials for renewable energy, e.g. for fuel and solar cells, we introduce CombiFD, a framework for factor based pattern decomposition that allows the incorporation of a-priori knowledge as constraints, including complex combinatorial constraints. In addition, we propose a new pattern decomposition algorithm, called AMIQO, based on solving a sequence of (mixed-integer) quadratic programs. Our approach considerably outperforms the state of the art on the materials discovery problem, scaling to larger datasets and recovering more precise and physically meaningful decompositions. We also show the effectiveness of our approach for enforcing background knowledge on other application domains.
Multi-Document Summarization Based on Two-Level Sparse Representation Model
Liu, He (Peking University) | Yu, Hongliang (Peking University) | Deng, Zhi-Hong (Peking University)
Multi-document summarization is of great value to many real world applications since it can help people get the main ideas within a short time.In this paper, we tackle the problem of extracting summary sentences from multi-document sets by applying sparse coding techniques and present a novel framework to this challenging problem. Based on the data reconstruction and sentence denoising assumption, we present a two-level sparse representation model to depict the process of multi-document summarization. Three requisite properties is proposed to form an ideal reconstructable summary: Coverage, Sparsity and Diversity. We then formalize the task of multi-document summarization as an optimization problem according to the above properties, and use simulated annealing algorithm to solve it.Extensive experiments on summarization benchmark data sets DUC2006 and DUC2007 show that our proposed model is effective and outperforms the state-of-the-art algorithms.
An EBMC-Based Approach to Selecting Types for Entity Filtering
Ding, Jiwei (Nanjing University) | Ding, Wentao (Nanjing University) | Hu, Wei (Nanjing University) | Qu, Yuzhong (Nanjing University)
The quantity of entities in the Linked Data is increasing rapidly. For entity search and browsing systems, filtering is very useful for users to find entities that they are interested in. Type is a kind of widely-used facet and can be easily obtained from knowledge bases, which enables to create filters by selecting at most K types of an entity collection. However, existing approaches often fail to select high-quality type filters due to complex overlap between types. In this paper, we propose a novel type selection approach based upon Budgeted Maximum Coverage (BMC), which can achieve integral optimization for the coverage quality of type filters. Furthermore, we define a new optimization problem called Extended Budgeted Maximum Coverage (EBMC) and propose an EBMC-based approach, which enhances the BMC-based approach by incorporating the relevance between entities and types, so as to create sensible type filters. Our experimental results show that the EBMC-based approach performs best comparing with several representative approaches.
An Entropy Search Portfolio for Bayesian Optimization
Shahriari, Bobak, Wang, Ziyu, Hoffman, Matthew W., Bouchard-Côté, Alexandre, de Freitas, Nando
Bayesian optimization is a sample-efficient method for black-box global optimization. How- ever, the performance of a Bayesian optimization method very much depends on its exploration strategy, i.e. the choice of acquisition function, and it is not clear a priori which choice will result in superior performance. While portfolio methods provide an effective, principled way of combining a collection of acquisition functions, they are often based on measures of past performance which can be misleading. To address this issue, we introduce the Entropy Search Portfolio (ESP): a novel approach to portfolio construction which is motivated by information theoretic considerations. We show that ESP outperforms existing portfolio methods on several real and synthetic problems, including geostatistical datasets and simulated control tasks. We not only show that ESP is able to offer performance as good as the best, but unknown, acquisition function, but surprisingly it often gives better performance. Finally, over a wide range of conditions we find that ESP is robust to the inclusion of poor acquisition functions.
Group-Sparse Model Selection: Hardness and Relaxations
Baldassarre, Luca, Bhan, Nirav, Cevher, Volkan, Kyrillidis, Anastasios, Satpathi, Siddhartha
Group-based sparsity models are proven instrumental in linear regression problems for recovering signals from much fewer measurements than standard compressive sensing. The main promise of these models is the recovery of "interpretable" signals through the identification of their constituent groups. In this paper, we establish a combinatorial framework for group-model selection problems and highlight the underlying tractability issues. In particular, we show that the group-model selection problem is equivalent to the well-known NP-hard weighted maximum coverage problem (WMC). Leveraging a graph-based understanding of group models, we describe group structures which enable correct model selection in polynomial time via dynamic programming. Furthermore, group structures that lead to totally unimodular constraints have tractable discrete as well as convex relaxations. We also present a generalization of the group-model that allows for within group sparsity, which can be used to model hierarchical sparsity. Finally, we study the Pareto frontier of group-sparse approximations for two tractable models, among which the tree sparsity model, and illustrate selection and computation trade-offs between our framework and the existing convex relaxations.
Dependence space of matroids and its application to attribute reduction
Attribute reduction is a basic issue in knowledge representation and data mining. Rough sets provide a theoretical foundation for the issue. Matroids generalized from matrices have been widely used in many fields, particularly greedy algorithm design, which plays an important role in attribute reduction. Therefore, it is meaningful to combine matroids with rough sets to solve the optimization problems. In this paper, we introduce an existing algebraic structure called dependence space to study the reduction problem in terms of matroids. First, a dependence space of matroids is constructed. Second, the characterizations for the space such as consistent sets and reducts are studied through matroids. Finally, we investigate matroids by the means of the space and present two expressions for their bases. In a word, this paper provides new approaches to study attribute reduction.
A totally unimodular view of structured sparsity
Halabi, Marwa El, Cevher, Volkan
This paper describes a simple framework for structured sparse recovery based on convex optimization. We show that many structured sparsity models can be naturally represented by linear matrix inequalities on the support of the unknown parameters, where the constraint matrix has a totally unimodular (TU) structure. For such structured models, tight convex relaxations can be obtained in polynomial time via linear programming. Our modeling framework unifies the prevalent structured sparsity norms in the literature, introduces new interesting ones, and renders their tightness and tractability arguments transparent.
A Primal-Dual Algorithmic Framework for Constrained Convex Minimization
Tran-Dinh, Quoc, Cevher, Volkan
We present a primal-dual algorithmic framework to obtain approximate solutions to a prototypical constrained convex optimization problem, and rigorously characterize how common structural assumptions affect the numerical efficiency. Our main analysis technique provides a fresh perspective on Nesterov's excessive gap technique in a structured fashion and unifies it with smoothing and primal-dual methods. For instance, through the choices of a dual smoothing strategy and a center point, our framework subsumes decomposition algorithms, augmented Lagrangian as well as the alternating direction method-of-multipliers methods as its special cases, and provides optimal convergence rates on the primal objective residual as well as the primal feasibility gap of the iterates for all.
Self-Dictionary Sparse Regression for Hyperspectral Unmixing: Greedy Pursuit and Pure Pixel Search are Related
Fu, Xiao, Ma, Wing-Kin, Chan, Tsung-Han, Bioucas-Dias, José M.
This paper considers a recently emerged hyperspectral unmixing formulation based on sparse regression of a self-dictionary multiple measurement vector (SD-MMV) model, wherein the measured hyperspectral pixels are used as the dictionary. Operating under the pure pixel assumption, this SD-MMV formalism is special in that it allows simultaneous identification of the endmember spectral signatures and the number of endmembers. Previous SD-MMV studies mainly focus on convex relaxations. In this study, we explore the alternative of greedy pursuit, which generally provides efficient and simple algorithms. In particular, we design a greedy SD-MMV algorithm using simultaneous orthogonal matching pursuit. Intriguingly, the proposed greedy algorithm is shown to be closely related to some existing pure pixel search algorithms, especially, the successive projection algorithm (SPA). Thus, a link between SD-MMV and pure pixel search is revealed. We then perform exact recovery analyses, and prove that the proposed greedy algorithm is robust to noise---including its identification of the (unknown) number of endmembers---under a sufficiently low noise level. The identification performance of the proposed greedy algorithm is demonstrated through both synthetic and real-data experiments.
An Ant Colony Optimization Algorithm for Partitioning Graphs with Supply and Demand
Jovanovic, Raka, Tuba, Milan, Voss, Stefan
In this paper we focus on finding high quality solutions for the problem of maximum partitioning of graphs with supply and demand (MPGSD). There is a growing interest for the MPGSD due to its close connection to problems appearing in the field of electrical distribution systems, especially for the optimization of self-adequacy of interconnected microgrids. We propose an ant colony optimization algorithm for the problem. With the goal of further improving the algorithm we combine it with a previously developed correction procedure. In our computational experiments we evaluate the performance of the proposed algorithm on both trees and general graphs. The tests show that the method manages to find optimal solutions in more than 50% of the problem instances, and has an average relative error of less than 0.5% when compared to known optimal solutions. Keywords: Ant Colony Optimization, Microgrid, Graph Partitioning, Demand Vertex, Supply Vertex, Combinatorial Optimization 1. Introduction In recent years the research in the field of smart grids has had a significant increase in exploring the concept of interconnected microgrids [1].