Goto

Collaborating Authors

 Optimization


Learning Greedy Policies for the Easy-First Framework

AAAI Conferences

Easy-first, a search-based structured prediction approach, has been applied to many NLP tasks including dependency parsing and coreference resolution. This approach employs a learned greedy policy (action scoring function) to make easy decisions first, which constrains the remaining decisions and makes them easier. We formulate greedy policy learning in the Easy-first approach as a novel non-convex optimization problem and solve it via an efficient Majorization Minimizatoin (MM) algorithm. Results on within-document coreference and cross-document joint entity and event coreference tasks demonstrate that the proposed approach achieves statistically significant performance improvement over existing training regimes for Easy-first and is less susceptible to overfitting.


A Nonconvex Relaxation Approach for Rank Minimization Problems

AAAI Conferences

Recently, solving rank minimization problems by leveraging nonconvex relaxations has received significant attention. Some theoretical analyses demonstrate that it can provide a better approximation of original problems than convex relaxations. However, designing an effective algorithm to solve nonconvex optimization problems remains a big challenge. In this paper, we propose an Iterative Shrinkage-Thresholding and Reweighted Algorithm (ISTRA) to solve rank minimization problems using the nonconvex weighted nuclear norm as a low rank regularizer. We prove theoretically that under certain assumptions our method achieves a high-quality local optimal solution efficiently. Experimental results on synthetic and real data show that the proposed ISTRA algorithm outperforms state-of-the-art methods in both accuracy and efficiency.


An SVD and Derivative Kernel Approach to Learning from Geometric Data

AAAI Conferences

Motivated by problems such as molecular energy prediction, we derive an (improper) kernel between geometric inputs, that is able to capture the relevant rotational and translation invariances in geometric data. Since many physical simulations based upon geometric data produce derivatives of the output quantity with respect to the input positions, we derive an approach that incorporates derivative information into our kernel learning. We further show how to exploit the low rank structure of the resulting kernel matrices to speed up learning. Finally, we evaluated the method in the context of molecular energy prediction, showing good performance for modeling previously unseen molecular configurations. Integrating the approach into a Bayesian optimization, we show substantial improvement over the state of the art in molecular energy optimization.


Exploring Social Context for Topic Identification in Short and Noisy Texts

AAAI Conferences

With the pervasion of social media, topic identification in short texts attracts increasing attention in  recent years. However, in nature the texts of social media are short and noisy, and the structures are sparse and dynamic, resulting in difficulty to identify topic categories exactly from online social media. Inspired by social science findings that preference consistency and social contagion are observed in social media, we investigate topic identification in short and noisy texts by exploring social context from the perspective of social sciences. In particular, we present a mathematical optimization formulation that incorporates the preference consistency and social contagion theories into a supervised learning method, and conduct feature selection to tackle short and noisy texts in social media, which result in a Sociological framework for Topic Identification (STI). Experimental results on real-world datasets from Twitter and Citation Network demonstrate the effectiveness of the proposed framework. Further experiments are conducted to understand the importance of social context in topic identification.


Lazier Than Lazy Greedy

AAAI Conferences

Is it possible to maximize a monotone submodular function faster than the widely used lazy greedy algorithm (also known as accelerated greedy), both in theory and practice? In this paper, we develop the first linear-time algorithm for maximizing a general monotone submodular function subject to a cardinality constraint. We show that our randomized algorithm, STOCHASTIC-GREEDY, can achieve a (1 − 1/e − ε) approximation guarantee, in expectation, to the optimum solution in time linear in the size of the data and independent of the cardinality constraint. We empirically demonstrate the effectiveness of our algorithm on submodular functions arising in data summarization, including training large-scale kernel methods, exemplar-based clustering, and sensor placement. We observe that STOCHASTIC-GREEDY practically achieves the same utility value as lazy greedy but runs much faster. More surprisingly, we observe that in many practical scenarios STOCHASTIC-GREEDY does not evaluate the whole fraction of data points even once and still achieves indistinguishable results compared to lazy greedy.


BDD-Constrained Search: A Unified Approach to Constrained Shortest Path Problems

AAAI Conferences

Dynamic programming (DP) is a fundamental tool used to obtain exact, optimal solutions for many combinatorial optimization problems. Among these problems, important ones including the knapsack problems and the computation of edit distances between string pairs can be solved with a kind of DP that corresponds to solving the shortest path problem on a directed acyclic graph (DAG). These problems can be solved efficiently with DP, however, in practical situations, we want to solve the customized problems made by adding logical constraints to the original problems. Developing an algorithm specifically for each combination of a problem and a constraint set is unrealistic. The proposed method, BDD-Constrained Search (BCS), exploits a Binary Decision Diagram (BDD) that represents the logical constraints in combination with the DAG that represents the problem. The BCS runs DP on the DAG while using the BDD to check the equivalence and the validity of intermediate solutions to efficiently solve the problem. The important feature of BCS is that it can be applied to problems with various types of logical constraints in a unified way once we represent the constraints as a BDD. We give a theoretical analysis on the time complexity of BCS and also conduct experiments to compare its performance to that of a state-of-the-art integer linear programming solver.


A Theoretical Analysis of Optimization by Gaussian Continuation

AAAI Conferences

Optimization via continuation method is a widely used approach for solving nonconvex minimization problems. While this method generally does not provide a global minimum, empirically it often achieves a superior local minimum compared to alternative approaches such as gradient descent. However, theoretical analysis of this method is largely unavailable. Here, we provide a theoretical analysis that provides a bound on the endpoint solution of the continuation method. The derived bound depends on a problem specific characteristic that we refer to as optimization complexity. We show that this characteristic can be analytically computed when the objective function is expressed in some suitable basis functions. Our analysis combines elements of scale-space theory, regularization and differential equations.


On Unconstrained Quasi-Submodular Function Optimization

AAAI Conferences

With the extensive application of submodularity, its generalizations are constantly being proposed. However, most of them are tailored for special problems. In this paper, we focus on quasi-submodularity, a universal generalization, which satisfies weaker properties than submodularity but still enjoys favorable performance in optimization. Similar to the diminishing return property of submodularity, we first define a corresponding property called the single sub-crossing , then we propose two algorithms for unconstrained quasi-submodular function minimization and maximization, respectively. The proposed algorithms return the reduced lattices in O(n) iterations, and guarantee the objective function values are strictly monotonically increased or decreased after each iteration. Moreover, any local and global optima are definitely contained in the reduced lattices. Experimental results verify the effectiveness and efficiency of the proposed algorithms on lattice reduction.


Value-Directed Compression of Large-Scale Assignment Problems

AAAI Conferences

Data-driven analytics — in areas ranging from consumer marketing to public policy — often allow behavior prediction at the level of individuals rather than population segments , offering the opportunity to improve decisions that impact large populations. Modeling such (generalized) assignment problems as linear programs, we propose a general value-directed compression technique for solving such problems at scale. We dynamically segment the population into cells using a form of column generation, constructing groups of individuals who can provably be treated identically in the optimal solution. This compression allows problems, unsolvable using standard LP techniques, to be solved effectively. Indeed, once a compressed LP is constructed, problems can solved in milliseconds. We provide a theoretical analysis of themethods, outline the distributed implementation of the requisite data processing, and show how a single compressed LP can be used to solve multiple variants of the original LP near-optimally in real-time (e.g., tosupport scenario analysis). We also show how the method can be leveraged in integer programming models.  Experimental results on marketing contact optimization and political legislature problems validate the performance of our technique.


Solving Distributed Constraint Optimization Problems Using Logic Programming

AAAI Conferences

This paper explores the use of answer set programming (ASP) in solving distributed constraint optimization problems (DCOPs). It makes the following contributions: (i)~It shows how one can formulate DCOPs as logic programs; (ii)~It introduces ASP-DPOP, the first DCOP algorithm that is based on logic programming; (iii)~It experimentally shows that ASP-DPOP can be up to two orders of magnitude faster than DPOP (its imperative-programming counterpart) as well as solve some problems that DPOP fails to solve due to memory limitations; and (iv)~It demonstrates the applicability of ASP in the wide array of multi-agent problems currently modeled as DCOPs.