Goto

Collaborating Authors

 Optimization


PARyOpt: A software for Parallel Asynchronous Remote Bayesian Optimization

arXiv.org Machine Learning

PARyOpt is a python based implementation of the Bayesian optimization routine designed for remote and asynchronous function evaluations. Bayesian optimization is especially attractive for computational optimization due to its low cost function footprint as well as the ability to account for uncertainties in data. A key challenge to efficiently deploy any optimization strategy on distributed computing systems is the synchronization step, where data from multiple function calls is assimilated to identify the next campaign of function calls. Bayesian optimization provides an elegant approach to overcome this issue via asynchronous updates. We formulate, develop and implement a parallel, asynchronous variant of Bayesian optimization. The framework is robust and resilient to external failures. We show how such asynchronous evaluations help reduce the total optimization wall clock time for a suite of test problems. Additionally, we show how the software design of the framework allows easy extension to response surface reconstruction (Kriging), providing a high performance software for autonomous exploration. The software is available on PyPI, with examples and documentation.


A Fast Globally Linearly Convergent Algorithm for the Computation of Wasserstein Barycenters

arXiv.org Machine Learning

In this paper, we consider the problem of computing a Wasserstein barycenter for a set of discrete probability distributions with finite supports, which finds many applications in different areas such as statistics, machine learning and image processing. When the support points of the barycenter are pre-specified, this problem can be modeled as a linear programming (LP), while the problem size can be extremely large. To handle this large-scale LP, in this paper, we derive its dual problem, which is conceivably more tractable and can be reformulated as a well-structured convex problem with 3 kinds of block variables and a coupling linear equality constraint. We then adapt a symmetric Gauss-Seidel based alternating direction method of multipliers (sGS-ADMM) to solve the resulting dual problem and analyze its global convergence as well as its global linear convergence rate. We also show how all the subproblems involved can be solved exactly and efficiently. This makes our method suitable for computing a Wasserstein barycenter on a large dataset. In addition, our sGS-ADMM can be used as a subroutine in an alternating minimization method to compute a barycenter when its support points are not pre-specified. Numerical results on synthetic datasets and image datasets demonstrate that our method is more efficient for solving large-scale problems, comparing with two existing representative methods and the commercial software Gurobi.


Compact Optimization Algorithms with Re-sampled Inheritance

arXiv.org Artificial Intelligence

Compact optimization algorithms are a class of Estimation of Distribution Algorithms (EDAs) characterized by extremely limited memory requirements (hence they are called "compact"). As all EDAs, compact algorithms build and update a probabilistic model of the distribution of solutions within the search space, as opposed to population-based algorithms that instead make use of an explicit population of solutions. In addition to that, to keep their memory consumption low, compact algorithms purposely employ simple probabilistic models that can be described with a small number of parameters. Despite their simplicity, compact algorithms have shown good performances on a broad range of benchmark functions and real-world problems. However, compact algorithms also come with some drawbacks, i.e. they tend to premature convergence and show poorer performance on non-separable problems. To overcome these limitations, here we investigate a possible memetic computing approach obtained by combining compact algorithms with a non-disruptive restart mechanism taken from the literature, named Re-Sampled Inheritance (RI). The resulting compact algorithms with RI are then tested on the CEC 2014 benchmark functions. The numerical results show on the one hand that the use of RI consistently enhances the performances of compact algorithms, still keeping a limited usage of memory. On the other hand, our experiments show that among the tested algorithms, the best performance is obtained by compact Differential Evolution with RI.


Optimization with Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals

arXiv.org Machine Learning

We show that many machine learning goals, such as improved fairness metrics, can be expressed as constraints on the model's predictions, which we call rate constraints. We study the problem of training non-convex models subject to these rate constraints (or any non-convex and non-differentiable constraints). In the non-convex setting, the standard approach of Lagrange multipliers may fail. Furthermore, if the constraints are non-differentiable, then one cannot optimize the Lagrangian with gradient-based methods. To solve these issues, we introduce the proxy-Lagrangian formulation. This new formulation leads to an algorithm that produces a stochastic classifier by playing a two-player non-zero-sum game solving for what we call a semi-coarse correlated equilibrium, which in turn corresponds to an approximately optimal and feasible solution to the constrained optimization problem. We then give a procedure which shrinks the randomized solution down to one that is a mixture of at most $m+1$ deterministic solutions, given $m$ constraints. This culminates in algorithms that can solve non-convex constrained optimization problems with possibly non-differentiable and non-convex constraints with theoretical guarantees. We provide extensive experimental results enforcing a wide range of policy goals including different fairness metrics, and other goals on accuracy, coverage, recall, and churn.


Smooth Structured Prediction Using Quantum and Classical Gibbs Samplers

arXiv.org Machine Learning

We introduce a quantum algorithm for solving structured-prediction problems with a runtime that scales with the square root of the size of the label space, but scales in $\widetilde O\left(\frac{1}{\epsilon^5}\right)$ with respect to the precision of the solution. In doing so, we analyze a stochastic gradient algorithm for convex optimization in the presence of an additive error in the calculation of the gradients, and show that its convergence rate does not deteriorate if the additive errors are of the order $\widetilde O(\epsilon)$. Our algorithm uses quantum Gibbs sampling at temperature $O (\epsilon)$ as a subroutine. Numerical results using Monte Carlo simulations on an image tagging task demonstrate the benefit of the approach.


Solving Non-identifiable Latent Feature Models

arXiv.org Machine Learning

Latent feature models (LFM)s are widely employed for extracting latent structures of data. While offering high, parameter estimation is difficult with LFMs because of the combinational nature of latent features, and non-identifiability is a particularly difficult problem when parameter estimation is not unique and there exists equivalent solutions. In this paper, a necessary and sufficient condition for non-identifiability is shown. The condition is significantly related to dependency of features, and this implies that non-identifiability may often occur in real-world applications. A novel method for parameter estimation that solves the non-identifiability problem is also proposed. This method can be combined as a post-process with existing methods and can find an appropriate solution by hopping efficiently through equivalent solutions. We have evaluated the effectiveness of the method on both synthetic and real-world datasets.


Abstraction Learning

arXiv.org Artificial Intelligence

There has been a gap between artificial intelligence and human intelligence. In this paper, we identify three key elements forming human intelligence, and suggest that abstraction learning combines these elements and is thus a way to bridge the gap. Prior researches in artificial intelligence either specify abstraction by human experts, or take abstraction as a qualitative explanation for the model. This paper aims to learn abstraction directly. We tackle three main challenges: representation, objective function, and learning algorithm. Specifically, we propose a partition structure that contains pre-allocated abstraction neurons; we formulate abstraction learning as a constrained optimization problem, which integrates abstraction properties; we develop a network evolution algorithm to solve this problem. This complete framework is named ONE (Optimization via Network Evolution). In our experiments on MNIST, ONE shows elementary human-like intelligence, including low energy consumption, knowledge sharing, and lifelong learning.


Learning to Advertise with Adaptive Exposure via Constrained Two-Level Reinforcement Learning

arXiv.org Machine Learning

For online advertising in e-commerce, the traditional problem is to assign the right ad to the right user on fixed ad slots. In this paper, we investigate the problem of advertising with adaptive exposure, in which the number of ad slots and their locations can dynamically change over time based on their relative scores with recommendation products. In order to maintain user retention and long-term revenue, there are two types of constraints that need to be met in exposure: query-level and day-level constraints. We model this problem as constrained markov decision process with per-state constraint (psCMDP) and propose a constrained two-level reinforcement learning to decouple the original advertising exposure optimization problem into two relatively independent sub-optimization problems. We also propose a constrained hindsight experience replay mechanism to accelerate the policy training process. Experimental results show that our method can improve the advertising revenue while satisfying different levels of constraints under the real-world datasets. Besides, the proposal of constrained hindsight experience replay mechanism can significantly improve the training speed and the stability of policy performance.


Improving Optimization Bounds using Machine Learning: Decision Diagrams meet Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Finding tight bounds on the optimal solution is a critical element of practical solution methods for discrete optimization problems. In the last decade, decision diagrams (DDs) have brought a new perspective on obtaining upper and lower bounds that can be significantly better than classical bounding mechanisms, such as linear relaxations. It is well known that the quality of the bound achieved through this flexible bounding method is highly reliant on the ordering of variables chosen for building the diagram, and finding an ordering that optimizes standard metrics, or even improving one, is an NP-hard problem. In this paper, we propose an innovative and generic approach based on deep reinforcement learning for obtaining an ordering for tightening the bounds obtained with relaxed and restricted DDs. We apply the approach to both the Maximum Independent Set Problem and the Maximum Cut Problem. Experimental results on synthetic instances show that the deep reinforcement learning approach, by achieving tighter objective function bounds, generally outperforms ordering methods commonly used in the literature when the distribution of instances is known. To the best knowledge of the authors, this is the first paper to apply machine learning to directly improve relaxation bounds obtained by general-purpose bounding mechanisms for combinatorial optimization problems.


A Block Coordinate Ascent Algorithm for Mean-Variance Optimization

arXiv.org Machine Learning

Risk management plays a central role in sequential decision-making problems, common in fields such as portfolio management [Lai et al., 2011], autonomous driving [Maurer et al., 2016], and healthcare [Parker, 2009]. A common risk-measure is the variance of the expected sum of rewards/costs and the mean-variance tradeoff function [Sobel, 1982; Mannor and Tsitsiklis, 2011] is one of the most widely used objective functions in risk-sensitive decision-making. Other risk-sensitive objectives have also been studied, for example, Borkar [2002] studied exponential utility functions, Tamar et al. [2012] experimented with the Sharpe Ratio measurement, Chow et al. [2018] studied value at risk (VaR) and mean-VaR optimization, Chow and Ghavamzadeh [2014], Tamar et al. [2015b], and Chow et al. [2018] investigated conditional value at risk (CVaR) and mean-CVaR optimization in a static setting, and Tamar et al. [2015a] investigated coherent risk for both linear and nonlinear system dynamics. Compared with other widely used performance measurements, such as the Sharpe Ratio and CVaR, the mean-variance measurement has explicit interpretability and computational advantages [Markowitz et al., 2000; Li and Ng, 2000]. For example, the Sharpe Ratio tends to lead to solutions with less mean return [Tamar et al., 2012].