Goto

Collaborating Authors

 Optimization


Linear interpolation gives better gradients than Gaussian smoothing in derivative-free optimization

arXiv.org Machine Learning

In this paper, we consider derivative free optimization problems, where the objective function is smooth but is computed with some amount of noise, the function evaluations are expensive and no derivative information is available. We are motivated by policy optimization problems in reinforcement learning that have recently become popular [Choromaski et al. 2018; Fazel et al. 2018; Salimans et al. 2016], and that can be formulated as derivative free optimization problems with the aforementioned characteristics. In each of these works some approximation of the gradient is constructed and a (stochastic) gradient method is applied. In [Salimans et al. 2016] the gradient information is aggregated along Gaussian directions, while in [Choromaski et al. 2018] it is computed along orthogonal direction. We provide a convergence rate analysis for a first-order line search method, similar to the ones used in the literature, and derive the conditions on the gradient approximations that ensure this convergence. We then demonstrate via rigorous analysis of the variance and by numerical comparisons on reinforcement learning tasks that the Gaussian sampling method used in [Salimans et al. 2016] is significantly inferior to the orthogonal sampling used in [Choromaski et al. 2018] as well as more general interpolation methods.


Automated Machine Learning with Monte-Carlo Tree Search (Extended Version)

arXiv.org Machine Learning

The AutoML task consists of selecting the proper algorithm in a machine learning portfolio, and its hyperparameter values, in order to deliver the best performance on the dataset at hand. Mosaic, a Monte-Carlo tree search (MCTS) based approach, is presented to handle the AutoML hybrid structural and parametric expensive black-box optimization problem. Extensive empirical studies are conducted to independently assess and compare: i) the optimization processes based on Bayesian optimization or MCTS; ii) its warm-start initialization; iii) the ensembling of the solutions gathered along the search. Mosaic is assessed on the OpenML 100 benchmark and the Scikit-learn portfolio, with statistically significant gains over Auto-Sklearn, winner of former international AutoML challenges.


Multi-layer Residual Sparsifying Transform Learning for Image Reconstruction

arXiv.org Machine Learning

Signal models based on sparsity, low-rank and other properties have been exploited for image reconstruction from limited and corrupted data in medical imaging and other computational imaging applications. In particular, sparsifying transform models have shown promise in various applications, and offer numerous advantages such as efficiencies in sparse coding and learning. This work investigates pre-learning a multi-layer extension of the transform model for image reconstruction, wherein the transform domain or filtering residuals of the image are further sparsified over the layers. The residuals from multiple layers are jointly minimized during learning, and in the regularizer for reconstruction. The proposed block coordinate descent optimization algorithms involve highly efficient updates. Preliminary numerical experiments demonstrate the usefulness of a two-layer model over the previous related schemes for CT image reconstruction from low-dose measurements.


Average-case Analysis of the Assignment Problem with Independent Preferences

arXiv.org Artificial Intelligence

The fundamental assignment problem is in search of welfare maximization mechanisms to allocate items to agents when the private preferences over indivisible items are provided by self-interested agents. The mainstream mechanism \textit{Random Priority} is asymptotically the best mechanism for this purpose, when comparing its welfare to the optimal social welfare using the canonical \textit{worst-case approximation ratio}. Despite its popularity, the efficiency loss indicated by the worst-case ratio does not have a constant bound. Recently, [Deng, Gao, Zhang 2017] show that when the agents' preferences are drawn from a uniform distribution, its \textit{average-case approximation ratio} is upper bounded by 3.718. They left it as an open question of whether a constant ratio holds for general scenarios. In this paper, we offer an affirmative answer to this question by showing that the ratio is bounded by $1/\mu$ when the preference values are independent and identically distributed random variables, where $\mu$ is the expectation of the value distribution. This upper bound also improves the upper bound of 3.718 in [Deng, Gao, Zhang 2017] for the Uniform distribution. Moreover, under mild conditions, the ratio has a \textit{constant} bound for any independent random values. En route to these results, we develop powerful tools to show the insights that in most instances the efficiency loss is small.


End to end learning and optimization on graphs

arXiv.org Machine Learning

Real-world applications often combine learning and optimization problems on graphs. For instance, our objective may be to cluster the graph in order to detect meaningful communities (or solve other common graph optimization problems such as facility location, maxcut, and so on). However, graphs or related attributes are often only partially observed, introducing learning problems such as link prediction which must be solved prior to optimization. We propose an approach to integrate a differentiable proxy for common graph optimization problems into training of machine learning models for tasks such as link prediction. This allows the model to focus specifically on the downstream task that its predictions will be used for. Experimental results show that our end-to-end system obtains better performance on example optimization tasks than can be obtained by combining state of the art link prediction methods with expert-designed graph optimization algorithms.


GENO -- GENeric Optimization for Classical Machine Learning

arXiv.org Machine Learning

Although optimization is the longstanding algorithmic backbone of machine learning, new models still require the time-consuming implementation of new solvers. As a result, there are thousands of implementations of optimization algorithms for machine learning problems. A natural question is, if it is always necessary to implement a new solver, or if there is one algorithm that is sufficient for most models. Common belief suggests that such a one-algorithm-fits-all approach cannot work, because this algorithm cannot exploit model specific structure and thus cannot be efficient and robust on a wide variety of problems. Here, we challenge this common belief. We have designed and implemented the optimization framework GENO (GENeric Optimization) that combines a modeling language with a generic solver. GENO generates a solver from the declarative specification of an optimization problem class. The framework is flexible enough to encompass most of the classical machine learning problems. We show on a wide variety of classical but also some recently suggested problems that the automatically generated solvers are (1) as efficient as well-engineered specialized solvers, (2) more efficient by a decent margin than recent state-of-the-art solvers, and (3) orders of magnitude more efficient than classical modeling language plus solver approaches.


Multi-objective Bayesian Optimization using Pareto-frontier Entropy

arXiv.org Machine Learning

We propose Pareto-frontier entropy search (PFES) for multi-objective Bayesian optimization (MBO). Unlike the existing entropy search for MBO which considers the entropy of the input space, we define the entropy of Pareto-frontier in the output space. By using a sampled Pareto-frontier from the current model, PFES provides a simple formula for directly evaluating the entropy. Besides the usual MBO setting, in which all the objectives are simultaneously observed, we also consider the "decoupled" setting, in which the objective functions can be observed separately. PFES can easily derive an acquisition function for the decoupled setting through the entropy of the marginal density for each output variable. For the both settings, by conditioning on the sampled Pareto-frontier, dependence among different objectives arises in the entropy evaluation. PFES can incorporate this dependency into the acquisition function, while the existing information-based MBO employs an independent Gaussian approximation. Our numerical experiments show effectiveness of PFES through synthetic functions and real-world datasets from materials science.


Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games

arXiv.org Machine Learning

We study the global convergence of policy optimization for finding the Nash equilibria (NE) in zero-sum linear quadratic (LQ) games. To this end, we first investigate the landscape of LQ games, viewing it as a nonconvex-nonconcave saddle-point problem in the policy space. Specifically, we show that despite its nonconvexity and nonconcavity, zero-sum LQ games have the property that the stationary point of the objective with respect to the feedback control policies constitutes the NE of the game. Building upon this, we develop three projected nested-gradient methods that are guaranteed to converge to the NE of the game. Moreover, we show that all of these algorithms enjoy both global sublinear and local linear convergence rates. Simulation results are then provided to validate the proposed algorithms. To the best of our knowledge, this work appears the first to investigate the optimization landscape of LQ games, and provably show the convergence of policy optimization methods to the Nash equilibria. Our work serves as an initial step of understanding the theoretical aspects of policy-based reinforcement learning algorithms for zero-sum Markov games in general.


Testing that a Local Optimum of the Likelihood is Globally Optimum using Reparameterized Embeddings

arXiv.org Machine Learning

Many mathematical imaging problems are posed as non-convex optimization problems. When numerically tractable global optimization procedures are not available, one is often interested in testing ex post facto whether or not a locally convergent algorithm has found the globally optimal solution. If the problem has a statistical maximum likelihood formulation, a local test of global optimality can be constructed. In this paper, we develop an improved test, based on a global maximum validation function proposed by Biernacki, under the assumption that the statistical distribution is in the generalized location family, a condition often satisfied in imaging problems. In addition, a new reparameterization and embedding procedure is presented that exploits knowledge about the forward operator to improve the global maximum validation function. Finally, the reparameterized embedding technique is applied to a physically-motivated joint-inverse problem arising in camera blur estimation. The advantages of the proposed global optimum testing techniques are numerically demonstrated in terms of increased detection accuracy and reduced computation.


Optimized Score Transformation for Fair Classification

arXiv.org Machine Learning

Recent years have seen a surge of interest in the problem of fair classification, which is concerned with disparities in classification output or performance when conditioned on a protected attribute such as race or gender, or ethnicity. Many measures of fairness have been introduced [1-14] and fairness-enhancing interventions have been proposed to mitigate these disparities [15]. Roughly categorized, these interventions either (i) change data used to train a classifier (pre-processing) [16-20], (ii) change a classifier's output (post-processing) [4, 21-24], or (iii) directly change a classification model to ensure fairness (in-processing) [5, 25-32]. This paper places more emphasis on probabilistic classification in which the outputs of interest are predicted probabilities of belonging to one of the classes, often referred to as scores, as opposed to binary predictions. Scores are desirable because they indicate confidences in predictions. We propose an optimization formulation for transforming scores to satisfy fairness constraints while minimizing the loss in utility. The formulation accommodates any fairness criteria that can be expressed as linear inequalities involving conditional means of scores, including variants of statistical parity (SP) [1] and equalized odds (EO) [4, 5]. We derive a closed-form expression for the optimal transformed scores and a convex dual optimization problem for the Lagrange multipliers that parametrize the transformation.