Optimization
Towards Better Response Times and Higher-Quality Queries in Interactive Knowledge Base Debugging
Many AI applications rely on knowledge encoded in a locigal knowledge base (KB). The most essential benefit of such logical KBs is the opportunity to perform automatic reasoning which however requires a KB to meet some minimal quality criteria such as consistency. Without adequate tool assistance, the task of resolving such violated quality criteria in a KB can be extremely hard, especially when the problematic KB is large and complex. To this end, interactive KB debuggers have been introduced which ask a user queries whether certain statements must or must not hold in the intended domain. The given answers help to gradually restrict the search space for KB repairs. Existing interactive debuggers often rely on a pool-based strategy for query computation. A pool of query candidates is precomputed, from which the best candidate according to some query quality criterion is selected to be shown to the user. This often leads to the generation of many unnecessary query candidates and thus to a high number of expensive calls to logical reasoning services. We tackle this issue by an in-depth mathematical analysis of diverse real-valued active learning query selection measures in order to determine qualitative criteria that make a query favorable. These criteria are the key to devising efficient heuristic query search methods. The proposed methods enable for the first time a completely reasoner-free query generation for interactive KB debugging while at the same time guaranteeing optimality conditions, e.g. minimal cardinality or best understandability for the user, of the generated query that existing methods cannot realize. Further, we study different relations between active learning measures. The obtained picture gives a hint about which measures are more favorable in which situation or which measures always lead to the same outcomes, based on given types of queries.
The ALAMO approach to machine learning
Wilson, Zachary T., Sahinidis, Nikolaos V.
ALAMO is a computational methodology for leaning algebraic functions from data. Given a data set, the approach begins by building a low-complexity, linear model composed of explicit non-linear transformations of the independent variables. Linear combinations of these non-linear transformations allow a linear model to better approximate complex behavior observed in real processes. The model is refined, as additional data are obtained in an adaptive fashion through error maximization sampling using derivative-free optimization. Models built using ALAMO can enforce constraints on the response variables to incorporate first-principles knowledge. The ability of ALAMO to generate simple and accurate models for a number of reaction problems is demonstrated. The error maximization sampling is compared with Latin hypercube designs to demonstrate its sampling efficiency. ALAMO's constrained regression methodology is used to further refine concentration models, resulting in models that perform better on validation data and satisfy upper and lower bounds placed on model outputs.
Optimization of Tree Ensembles
Tree ensemble models such as random forests and boosted trees are among the most widely used and practically successful predictive models in applied machine learning and business analytics. Although such models have been used to make predictions based on exogenous, uncontrollable independent variables, they are increasingly being used to make predictions where the independent variables are controllable and are also decision variables. In this paper, we study the problem of tree ensemble optimization: given a tree ensemble that predicts some dependent variable using controllable independent variables, how should we set these variables so as to maximize the predicted value? We formulate the problem as a mixed-integer optimization problem. We theoretically examine the strength of our formulation, provide a hierarchy of approximate formulations with bounds on approximation quality and exploit the structure of the problem to develop two large-scale solution methods, one based on Benders decomposition and one based on iteratively generating tree split constraints. We test our methodology on real data sets, including two case studies in drug design and customized pricing, and show that our methodology can efficiently solve large-scale instances to near or full optimality, and outperforms solutions obtained by heuristic approaches. In our drug design case, we show how our approach can identify compounds that efficiently trade-off predicted performance and novelty with respect to existing, known compounds. In our customized pricing case, we show how our approach can efficiently determine optimal store-level prices under a random forest model that delivers excellent predictive accuracy.
High Dimensional Inference with Random Maximum A-Posteriori Perturbations
Hazan, Tamir, Orabona, Francesco, Sarwate, Anand D., Maji, Subhransu, Jaakkola, Tommi
This paper presents a new approach, called perturb-max, for high-dimensional statistical inference that is based on applying random perturbations followed by optimization. This framework injects randomness to maximum a-posteriori (MAP) predictors by randomly perturbing the potential function for the input. A classic result from extreme value statistics asserts that perturb-max operations generate unbiased samples from the Gibbs distribution using high-dimensional perturbations. Unfortunately, the computational cost of generating so many high-dimensional random variables can be prohibitive. However, when the perturbations are of low dimension, sampling the perturb-max prediction is as efficient as MAP optimization. This paper shows that the expected value of perturb-max inference with low dimensional perturbations can be used sequentially to generate unbiased samples from the Gibbs distribution. Furthermore the expected value of the maximal perturbations is a natural bound on the entropy of such perturb-max models. A measure concentration result for perturb-max values shows that the deviation of their sampled average from its expectation decays exponentially in the number of samples, allowing effective approximation of the expectation.
Algorithms for stochastic optimization with expectation constraints
This paper considers the problem of minimizing an expectation function over a closed convex set, coupled with an expectation constraint on either decision variables or problem parameters. We first present a new stochastic approximation (SA) type algorithm, namely the cooperative SA (CSA), to handle problems with the expectation constraint on devision variables. We show that this algorithm exhibits the optimal ${\cal O}(1/\sqrt{N})$ rate of convergence, in terms of both optimality gap and constraint violation, when the objective and constraint functions are generally convex, where $N$ denotes the number of iterations. Moreover, we show that this rate of convergence can be improved to ${\cal O}(1/N)$ if the objective and constraint functions are strongly convex. We then present a variant of CSA, namely the cooperative stochastic parameter approximation (CSPA) algorithm, to deal with the situation when the expectation constraint is defined over problem parameters and show that it exhibits similar optimal rate of convergence to CSA. It is worth noting that CSA and CSPA are primal methods which do not require the iterations on the dual space and/or the estimation on the size of the dual variables. To the best of our knowledge, this is the first time that such optimal SA methods for solving expectation constrained stochastic optimization are presented in the literature.
Distributed Convolutional Sparse Coding
Moreau, Thomas, Oudre, Laurent, Vayatis, Nicolas
We consider the problem of building shift-invariant representations for long signals in the context of distributed processing. We propose an asynchronous algorithm based on coordinate descent called DICOD to efficiently solve the $\ell_1$-minimization problems involved in convolutional sparse coding. This algorithm leverages the weak temporal dependency of the convolution to reduce the interprocess communication to a few local messages. We prove that this algorithm converges to the optimal solution and that it scales with superlinear speedup, up to a certain limit. These properties are illustrated with numerical experiments and our algorithm is compared to the state-of-the-art methods used for convolutional sparse coding.
The Price of Anarchy in Auctions
Roughgarden, Tim, Syrgkanis, Vasilis, Tardos, Eva
This survey outlines a general and modular theory for proving approximation guarantees for equilibria of auctions in complex settings. This theory complements traditional economic techniques, which generally focus on exact and optimal solutions and are accordingly limited to relatively stylized settings. We highlight three user-friendly analytical tools: smoothness-type inequalities, which immediately yield approximation guarantees for many auction formats of interest in the special case of complete information and deterministic strategies; extension theorems, which extend such guarantees to randomized strategies, no-regret learning outcomes, and incomplete-information settings; and composition theorems, which extend such guarantees from simpler to more complex auctions.
Learning Data Manifolds with a Cutting Plane Method
Chung, SueYeon, Cohen, Uri, Sompolinsky, Haim, Lee, Daniel D.
We consider the problem of classifying data manifolds where each manifold represents invariances that are parameterized by continuous degrees of freedom. Conventional data augmentation methods rely upon sampling large numbers of training examples from these manifolds; instead, we propose an iterative algorithm called M_{CP} based upon a cutting-plane approach that efficiently solves a quadratic semi-infinite programming problem to find the maximum margin solution. We provide a proof of convergence as well as a polynomial bound on the number of iterations required for a desired tolerance in the objective function. The efficiency and performance of M_{CP} are demonstrated in high-dimensional simulations and on image manifolds generated from the ImageNet dataset. Our results indicate that M_{CP} is able to rapidly learn good classifiers and shows superior generalization performance compared with conventional maximum margin methods using data augmentation methods.
[R] Fast way to find argmax of a radial basis function? (optimization problem). • r/MachineLearning
I'm working on a continuous state and action Q-learning algorithm, using radial basis nets to store the value function. It works OK, but I'm not completely happy with the way I'm doing the argmax to get the action. Basically I have a radial basis function and I have to find maximum. The function looks like this more or less. Things I've tried so far: Not that fast either, and prone to stuck in local maximum.
Implicit Regularization in Matrix Factorization
Gunasekar, Suriya, Woodworth, Blake, Bhojanapalli, Srinadh, Neyshabur, Behnam, Srebro, Nathan
We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix $X$ with gradient descent on a factorization of $X$. We conjecture and provide empirical and theoretical evidence that with small enough step sizes and initialization close enough to the origin, gradient descent on a full dimensional factorization converges to the minimum nuclear norm solution.