Goto

Collaborating Authors

 Optimization


CorrAttack: Black-box Adversarial Attack with Structured Search

arXiv.org Machine Learning

We present a new method for score-based adversarial attack, where the attacker queries the loss-oracle of the target model. Our method employs a parameterized search space with a structure that captures the relationship of the gradient of the loss function. We show that searching over the structured space can be approximated by a time-varying contextual bandits problem, where the attacker takes feature of the associated arm to make modifications of the input, and receives an immediate reward as the reduction of the loss function. The time-varying contextual bandits problem can then be solved by a Bayesian optimization procedure, which can take advantage of the features of the structured action space. The experiments on ImageNet and the Google Cloud Vision API demonstrate that the proposed method achieves the state of the art success rates and query efficiencies for both undefended and defended models. Although deep learning has many applications, it is known that neural networks are vulnerable to adversarial examples, which are small perturbations of inputs that can fool neural networks into making wrong predictions (Szegedy et al., 2014). While adversarial noise can easily be found when the neural models are known (referred to as white-box attack) (Kurakin et al., 2016). However, in real world scenarios models are often unknown, this situation is referred to as black-box attack.


Open Set Domain Adaptation using Optimal Transport

arXiv.org Machine Learning

We present a 2-step optimal transport approach that performs a mapping from a source distribution to a target distribution. Here, the target has the particularity to present new classes not present in the source domain. The first step of the approach aims at rejecting the samples issued from these new classes using an optimal transport plan. The second step solves the target (class ratio) shift still as an optimal transport problem. We develop a dual approach to solve the optimization problem involved at each step and we prove that our results outperform recent state-of-the-art performances. We further apply the approach to the setting where the source and target distributions present both a label-shift and an increasing covariate (features) shift to show its robustness.


A Unified Analysis of First-Order Methods for Smooth Games via Integral Quadratic Constraints

arXiv.org Machine Learning

The theory of integral quadratic constraints (IQCs) allows the certification of exponential convergence of interconnected systems containing nonlinear or uncertain elements. In this work, we adapt the IQC theory to study first-order methods for smooth and strongly-monotone games and show how to design tailored quadratic constraints to get tight upper bounds of convergence rates. Using this framework, we recover the existing bound for the gradient method~(GD), derive sharper bounds for the proximal point method~(PPM) and optimistic gradient method~(OG), and provide \emph{for the first time} a global convergence rate for the negative momentum method~(NM) with an iteration complexity $\bigo(\kappa^{1.5})$, which matches its known lower bound. In addition, for time-varying systems, we prove that the gradient method with optimal step size achieves the fastest provable worst-case convergence rate with quadratic Lyapunov functions. Finally, we further extend our analysis to stochastic games and study the impact of multiplicative noise on different algorithms. We show that it is impossible for an algorithm with one step of memory to achieve acceleration if it only queries the gradient once per batch (in contrast with the stochastic strongly-convex optimization setting, where such acceleration has been demonstrated). However, we exhibit an algorithm which achieves acceleration with two gradient queries per batch.


Multiobjectivization of Local Search: Single-Objective Optimization Benefits From Multi-Objective Gradient Descent

arXiv.org Artificial Intelligence

Optimization is essentially everywhere and most real-world problems are of nonlinear and multimodal nature, i.e., there may exist multiple local optima that become traps for local search [23]. That is, classical local search based on gradient descent will get stuck in local optima unless restart mechanisms or search space exploration methods prevent premature convergence. Much effort has been put into this issue. Early attempts tried to make local search more flexible, e.g., by adding search points or spanning simplex structures, to discover patterns in search space and allow non-derivative descent to the optimum [20]. However, local search cannot solve these problems in general. Thus, later approaches [1] combine originally one-dimensional global search mechanisms like the STEP global search [30] and a local interpolation technique proposed by Brent [3] for the multivariate case. Others combine established stochastic global search mechanisms based on clustering [24] with newer elements of global optimizers [29] to gain quality improvements of solutions and to avoid finding only local optima [22].


BOSS: Bayesian Optimization over String Spaces

arXiv.org Artificial Intelligence

This article develops a Bayesian optimization (BO) method which acts directly over raw strings, proposing the first uses of string kernels and genetic algorithms within BO loops. Recent applications of BO over strings have been hindered by the need to map inputs into a smooth and unconstrained latent space. Learning this projection is computationally and data-intensive. Our approach instead builds a powerful Gaussian process surrogate model based on string kernels, naturally supporting variable length inputs, and performs efficient acquisition function maximization for spaces with syntactical constraints. Experiments demonstrate considerably improved optimization over existing approaches across a broad range of constraints, including the popular setting where syntax is governed by a context-free grammar.


From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering

arXiv.org Machine Learning

Similarity-based Hierarchical Clustering (HC) is a classical unsupervised machine learning algorithm that has traditionally been solved with heuristic algorithms like Average-Linkage. Recently, Dasgupta reframed HC as a discrete optimization problem by introducing a global cost function measuring the quality of a given tree. In this work, we provide the first continuous relaxation of Dasgupta's discrete optimization problem with provable quality guarantees. The key idea of our method, HypHC, is showing a direct correspondence from discrete trees to continuous representations (via the hyperbolic embeddings of their leaf nodes) and back (via a decoding algorithm that maps leaf embeddings to a dendrogram), allowing us to search the space of discrete binary trees with continuous optimization. Building on analogies between trees and hyperbolic space, we derive a continuous analogue for the notion of lowest common ancestor, which leads to a continuous relaxation of Dasgupta's discrete objective. We can show that after decoding, the global minimizer of our continuous relaxation yields a discrete tree with a (1 + epsilon)-factor approximation for Dasgupta's optimal tree, where epsilon can be made arbitrarily small and controls optimization challenges. We experimentally evaluate HypHC on a variety of HC benchmarks and find that even approximate solutions found with gradient descent have superior clustering quality than agglomerative heuristics or other gradient based algorithms. Finally, we highlight the flexibility of HypHC using end-to-end training in a downstream classification task.


First-order Optimization for Superquantile-based Supervised Learning

arXiv.org Machine Learning

Classical supervised learning via empirical risk (or negative log-likelihood) minimization hinges upon the assumption that the testing distribution coincides with the training distribution. This assumption can be challenged in modern applications of machine learning in which learning machines may operate at prediction time with testing data whose distribution departs from the one of the training data. We revisit the superquantile regression method by proposing a first-order optimization algorithm to minimize a superquantile-based learning objective. The proposed algorithm is based on smoothing the superquantile function by infimal convolution. Promising numerical results illustrate the interest of the approach towards safer supervised learning.


Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization

arXiv.org Machine Learning

Importantly, the update and storage of the diagonal approximation of Hessian is as efficient as adaptive first-order optimization methods with linear complexity for both time and memory. To handle nonconvexity, we replace the Hessian with its rectified absolute value, which is guaranteed to be positive-definite. Nonconvex stochastic optimization is of core practical importance in many fields of machine learning, in particular for training deep neural networks (DNNs). First-order gradient-based optimization algorithms, conceptually attractive due to their linear efficiency on both the time and memory complexity, have led to tremendous progress and impressive successes. However, one disadvantage of SGD is that the gradients in different directions are scaled uniformly, resulting in limited convergence speed and sensitive choice of the learning rate, and thus has spawned a lot of recent interest in accelerating SGD from the algorithmic and practical perspectives. Recently, many adaptive first-order optimization methods have been proposed to achieve rapid training progress with element-wise scaled learning rates, and we can only mention a few here due to space limits. In their pioneering work, Duchi et al. (2011) proposed AdaGrad, which scales the gradient by the square root of the accumulative square gradients from the first iteration.


Meta-Heuristic Solutions to a Student Grouping Optimization Problem faced in Higher Education Institutions

arXiv.org Artificial Intelligence

Combinatorial problems which have been proven to be NP-hard are faced in Higher Education Institutions and researches have extensively investigated some of the well-known combinatorial problems such as the timetabling and student project allocation problems. However, NP-hard problems faced in Higher Education Institutions are not only confined to these categories of combinatorial problems. The majority of NP-hard problems faced in institutions involve grouping students and/or resources, albeit with each problem having its own unique set of constraints. Thus, it can be argued that techniques to solve NP-hard problems in Higher Education Institutions can be transferred across the different problem categories. As no method is guaranteed to outperform all others in all problems, it is necessary to investigate heuristic techniques for solving lesser-known problems in order to guide stakeholders or software developers to the most appropriate algorithm for each unique class of NP-hard problems faced in Higher Education Institutions. To this end, this study described an optimization problem faced in a real university that involved grouping students for the presentation of semester results. Ordering based heuristics, genetic algorithm and the ant colony optimization algorithm implemented in Python programming language were used to find feasible solutions to this problem, with the ant colony optimization algorithm performing better or equal in 75% of the test instances and the genetic algorithm producing better or equal results in 38% of the test instances.


A Direct-Indirect Hybridization Approach to Control-Limited DDP

arXiv.org Artificial Intelligence

Optimal control is a widely used tool for synthesizing motions and controls for user-defined tasks under physical constraints. A common approach is to formulate it using direct multiple-shooting and then to use off-the-shelf nonlinear programming solvers that can easily handle arbitrary constraints on the controls and states. However, these methods are not fast enough for many robotics applications such as real-time humanoid motor control. Exploiting the sparse structure of optimal control problem, such as in Differential DynamicProgramming (DDP), has proven to significantly boost the computational efficiency, and recent works have been focused on handling arbitrary constraints. Despite that, DDP has been associated with poor numerical convergence, particularly when considering long time horizons. One of the main reasons is due to system instabilities and poor warm-starting (only controls). This paper presents control-limited Feasibility-driven DDP (Box-FDDP), a solver that incorporates a direct-indirect hybridization of the control-limited DDP algorithm. Concretely, the forward and backward passes handle feasibility and control limits. We showcase the impact and importance of our method on a set of challenging optimal control problems against the Box-DDP and squashing-function approach.