Goto

Collaborating Authors

 Optimization


Natural Analysts in Adaptive Data Analysis

arXiv.org Machine Learning

Adaptive data analysis is frequently criticized for its pessimistic generalization guarantees. The source of these pessimistic bounds is a model that permits arbitrary, possibly adversarial analysts that optimally use information to bias results. While being a central issue in the field, still lacking are notions of natural analysts that allow for more optimistic bounds faithful to the reality that typical analysts aren't adversarial. In this work, we propose notions of natural analysts that smoothly interpolate between the optimal non-adaptive bounds and the best-known adaptive generalization bounds. To accomplish this, we model the analyst's knowledge as evolving according to the rules of an unknown dynamical system that takes in revealed information and outputs new statistical queries to the data. This allows us to restrict the analyst through different natural control-theoretic notions. One such notion corresponds to a recency bias, formalizing an inability to arbitrarily use distant information. Another complementary notion formalizes an anchoring bias, a tendency to weight initial information more strongly. Both notions come with quantitative parameters that smoothly interpolate between the non-adaptive case and the fully adaptive case, allowing for a rich spectrum of intermediate analysts that are neither non-adaptive nor adversarial. Natural not only from a cognitive perspective, we show that our notions also capture standard optimization methods, like gradient descent in various settings. This gives a new interpretation to the fact that gradient descent tends to overfit much less than its adaptive nature might suggest.


Splitting Methods for Convex Bi-Clustering and Co-Clustering

arXiv.org Machine Learning

Co-Clustering, the problem of simultaneously identifying clusters across multiple aspects of a data set, is a natural generalization of clustering to higher-order structured data. Recent convex formulations of bi-clustering and tensor co-clustering, which shrink estimated centroids together using a convex fusion penalty, allow for global optimality guarantees and precise theoretical analysis, but their computational properties have been less well studied. In this note, we present three efficient operator-splitting methods for the convex co-clustering problem: a standard two-block ADMM, a Generalized ADMM which avoids an expensive tensor Sylvester equation in the primal update, and a three-block ADMM based on the operator splitting scheme of Davis and Yin. Theoretical complexity analysis suggests, and experimental evidence confirms, that the Generalized ADMM is far more efficient for large problems.


Learning to Project in Multi-Objective Binary Linear Programming

arXiv.org Machine Learning

In this paper, we investigate the possibility of improving the performance of multi-objective optimization solution approaches using machine learning techniques. Specifically, we focus on multi-objective binary linear programs and employ one of the most effective and recently developed criterion space search algorithms, the so-called KSA, during our study. This algorithm computes all nondominated points of a problem with p objectives by searching on a projected criterion space, i.e., a (p-1)-dimensional criterion apace. We present an effective and fast learning approach to identify on which projected space the KSA should work. We also present several generic features/variables that can be used in machine learning techniques for identifying the best projected space. Finally, we present an effective bi-objective optimization based heuristic for selecting the best subset of the features to overcome the issue of overfitting in learning. Through an extensive computational study over 2000 instances of tri-objective Knapsack and Assignment problems, we demonstrate that an improvement of up to 12% in time can be achieved by the proposed learning method compared to a random selection of the projected space.


Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning

arXiv.org Machine Learning

The goal of this paper is to provide a unifying view of a wide range of problems of interest in machine learning by framing them as the minimization of functionals defined on the space of probability measures. In particular, we show that generative adversarial networks, variational inference, and actor-critic methods in reinforcement learning can all be seen through the lens of our framework. We then discuss a generic optimization algorithm for our formulation, called probability functional descent (PFD), and show how this algorithm recovers existing methods developed independently in the settings mentioned earlier.


Scalable Computation of High-Order Optimization Queries

Communications of the ACM

Constrained optimization problems are at the heart of significant applications in a broad range of domains, including finance, transportation, manufacturing, and healthcare. Modeling and solving these problems has relied on application-specific solutions, which are often complex, error-prone, and do not generalize. Our goal is to create a domain-independent, declarative approach, supported and powered by the system where the data relevant to these problems typically resides: the database. We present a complete system that supports package queries, a new query model that extends traditional database queries to handle complex constraints and preferences over answer sets, allowing the declarative specification and efficient evaluation of a significant class of constrained optimization problems--integer linear programs (ILP)--within a database. Traditional database queries follow a simple model: they define constraints, in the form of selection predicates, that each tuple in the result must satisfy.


Stochastic Conditional Gradient Method for Composite Convex Minimization

arXiv.org Machine Learning

In this paper, we propose the first practical algorithm to minimize stochastic composite optimization problems over compact convex sets. This template allows for affine constraints and therefore covers stochastic semidefinite programs (SDPs), which are vastly applicable in both machine learning and statistics. In this setup, stochastic algorithms with convergence guarantees are either not known or not tractable. We tackle this general problem and propose a convergent, easy to implement and tractable algorithm. We prove $\mathcal{O}(k^{-1/3})$ convergence rate in expectation on the objective residual and $\mathcal{O}(k^{-5/12})$ in expectation on the feasibility gap. These rates are achieved without increasing the batchsize, which can contain a single sample. We present extensive empirical evidence demonstrating the superiority of our algorithm on a broad range of applications including optimization of stochastic SDPs.


Rank-one Convexification for Sparse Regression

arXiv.org Machine Learning

Sparse regression models are increasingly prevalent due to their ease of interpretability and superior out-of-sample performance. However, the exact model of sparse regression with an $\ell_0$ constraint restricting the support of the estimators is a challenging non-convex optimization problem. In this paper, we derive new strong convex relaxations for sparse regression. These relaxations are based on the ideal (convex-hull) formulations for rank-one quadratic terms with indicator variables. The new relaxations can be formulated as semidefinite optimization problems in an extended space and are stronger and more general than the state-of-the-art formulations, including the perspective reformulation and formulations with the reverse Huber penalty and the minimax concave penalty functions. Furthermore, the proposed rank-one strengthening can be interpreted as a non-separable, non-convex sparsity-inducing regularizer, which dynamically adjusts its penalty according to the shape of the error function. In our computational experiments with benchmark datasets, the proposed conic formulations are solved within seconds and result in near-optimal solutions (with 0.4\% optimality gap) for non-convex $\ell_0$-problems. Moreover, the resulting estimators also outperform alternative convex approaches from a statistical viewpoint, achieving high prediction accuracy and good interpretability.


New Approximations for Coalitional Manipulation in Scoring Rules

Journal of Artificial Intelligence Research

We study the problem of coalitional manipulation---where k manipulators try to manipulate an election on m candidates---for any scoring rule, with focus on the Borda protocol. We do so in both the weighted and unweighted settings. For these problems, recent approximation approaches have tried to minimize k, the number of manipulators needed to make some preferred candidate p win (thus assuming that the number of manipulators is not limited in advance). In contrast, we focus on minimizing the score margin of p which is the difference between the maximum score of a candidate and the score of p.ย We provide algorithms that approximate the optimum score margin, which are applicable to any scoring rule. For the specific case of the Borda protocol in the unweighted setting, our algorithm provides a superior approximation factor for lower values of k.Our methods are novel and adapt techniques from multiprocessor scheduling by carefully rounding an exponentially-large configuration linear program that is solved by using the ellipsoid method with an efficient separation oracle. We believe that such methods could be beneficial in other social choice settings as well.


Trust Region-Guided Proximal Policy Optimization

arXiv.org Machine Learning

Model-free reinforcement learning relies heavily on a safe yet exploratory policy search. Proximal policy optimization (PPO) is a prominent algorithm to address the safe search problem, by exploiting a heuristic clipping mechanism motivated by a theoretically-justified "trust region" guidance. However, we found that the clipping mechanism of PPO could lead to a lack of exploration issue. Based on this finding, we improve the original PPO with an adaptive clipping mechanism guided by a "trust region" criterion. Our method, termed as Trust Region-Guided PPO (TRPPO), improves PPO with more exploration and better sample efficiency, while maintains the safe search property and design simplicity of PPO. On several benchmark tasks, TRPPO significantly outperforms the original PPO and is competitive with several state-of-the-art methods.


Knowledge Refinement via Rule Selection

arXiv.org Artificial Intelligence

In several different applications, including data transformation and entity resolution, rules are used to capture aspects of knowledge about the application at hand. Often, a large set of such rules is generated automatically or semi-automatically, and the challenge is to refine the encapsulated knowledge by selecting a subset of rules based on the expected operational behavior of the rules on available data. In this paper, we carry out a systematic complexity-theoretic investigation of the following rule selection problem: given a set of rules specified by Horn formulas, and a pair of an input database and an output database, find a subset of the rules that minimizes the total error, that is, the number of false positive and false negative errors arising from the selected rules. We first establish computational hardness results for the decision problems underlying this minimization problem, as well as upper and lower bounds for its approximability. We then investigate a bi-objective optimization version of the rule selection problem in which both the total error and the size of the selected rules are taken into account. We show that testing for membership in the Pareto front of this bi-objective optimization problem is DP-complete. Finally, we show that a similar DP-completeness result holds for a bi-level optimization version of the rule selection problem, where one minimizes first the total error and then the size.