Goto

Collaborating Authors

 Optimization


Swarming for Faster Convergence in Stochastic Optimization

arXiv.org Machine Learning

We study a distributed framework for stochastic optimizati on which is inspired by models of collective motion found in nature (e.g., swarming) with mild communication requirements. Specifically, we analyze a scheme in which each one of N 1 independent threads, implements in a distributed and unsynchronized fashion, a stochastic grad ient-descent algorithm which is perturbed by a swarming potential. Assuming the overhead caused by syn chronization is not negligible, we show the swarming-based approach exhibits better performance t han a centralized algorithm (based upon the average of N observations) in terms of (real-time) convergence speed. W e also derive an error bound that is monotone decreasing in network size and connec tivity. We characterize the scheme's finite-time performances for both convex and non-convex obj ective functions.


Smoothed analysis of the low-rank approach for smooth semidefinite programs

arXiv.org Machine Learning

We consider semidefinite programs (SDPs) of size n with equality constraints. In order to overcome scalability issues, Burer and Monteiro proposed a factorized approach based on optimizing over a matrix Y of size $n$ by $k$ such that $X = YY^*$ is the SDP variable. The advantages of such formulation are twofold: the dimension of the optimization variable is reduced and positive semidefiniteness is naturally enforced. However, the problem in Y is non-convex. In prior work, it has been shown that, when the constraints on the factorized variable regularly define a smooth manifold, provided k is large enough, for almost all cost matrices, all second-order stationary points (SOSPs) are optimal. Importantly, in practice, one can only compute points which approximately satisfy necessary optimality conditions, leading to the question: are such points also approximately optimal? To this end, and under similar assumptions, we use smoothed analysis to show that approximate SOSPs for a randomly perturbed objective function are approximate global optima, with k scaling like the square root of the number of constraints (up to log factors). We particularize our results to an SDP relaxation of phase retrieval.


Efficient Optimization Algorithms for Robust Principal Component Analysis and Its Variants

arXiv.org Machine Learning

Robust PCA has drawn significant attention in the last decade due to its success in numerous application domains, ranging from bio-informatics, statistics, and machine learning to image and video processing in computer vision. Robust PCA and its variants such as sparse PCA and stable PCA can be formulated as optimization problems with exploitable special structures. Many specialized efficient optimization methods have been proposed to solve robust PCA and related problems. In this paper we review existing optimization methods for solving convex and nonconvex relaxations/variants of robust PCA, discuss their advantages and disadvantages, and elaborate on their convergence behaviors. We also provide some insights for possible future research directions including new algorithmic frameworks that might be suitable for implementing on multi-processor setting to handle large-scale problems.


Explainable Deterministic MDPs

arXiv.org Artificial Intelligence

We present a method for a certain class of Markov Decision Processes (MDPs) that can relate the optimal policy back to one or more reward sources in the environment. For a given initial state, without fully computing the value function, q-value function, or the optimal policy the algorithm can determine which rewards will and will not be collected, whether a given reward will be collected only once or continuously, and which local maximum within the value function the initial state will ultimately lead to. We demonstrate that the method can be used to map the state space to identify regions that are dominated by one reward source and can fully analyze the state space to explain all actions. We provide a mathematical framework to show how all of this is possible without first computing the optimal policy or value function.


Provable defenses against adversarial examples via the convex outer adversarial polytope

arXiv.org Artificial Intelligence

We propose a method to learn deep ReLUbased classifiers that are provably robust against normbounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well. The basic idea is to consider a convex outer approximation of the set of activations reachable through a norm-bounded perturbation, and we develop a robust optimization procedure that minimizes the worst case loss over this outer region (via a linear program). Crucially, we show that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. The end result is that by executing a few more forward and backward passes through a slightly modified version of the original network (though possibly with much larger batch sizes), we can learn a classifier that is provably robust to any norm-bounded adversarial attack. We illustrate the approach on a number of tasks to train classifiers with robust adversarial guarantees (e.g. for MNIST, we produce a convolutional classifier that provably has less than 5.8% test error for any adversarial attack with bounded l


Locating the boundaries of Pareto fronts: A Many-Objective Evolutionary Algorithm Based on Corner Solution Search

arXiv.org Artificial Intelligence

In this paper, an evolutionary many-objective optimization algorithm based on corner solution search (MaOEA-CS) was proposed. MaOEA-CS implicitly contains two phases: the exploitative search for the most important boundary optimal solutions - corner solutions, at the first phase, and the use of angle-based selection [1] with the explorative search for the extension of PF approximation at the second phase. Due to its high efficiency and robustness to the shapes of PFs, it has won the CEC'2017 Competition on Evolutionary Many-Objective Optimization. In addition, MaOEA-CS has also been applied on two real-world engineering optimization problems with very irregular PFs. The experimental results show that MaOEA-CS outperforms other six state-of-the-art compared algorithms, which indicates it has the ability to handle real-world complex optimization problems with irregular PFs.


Causal Interventions for Fairness

arXiv.org Machine Learning

Most approaches in algorithmic fairness constrain machine learning methods so the resulting predictions satisfy one of several intuitive notions of fairness. While this may help private companies comply with non-discrimination laws or avoid negative publicity, we believe it is often too little, too late. By the time the training data is collected, individuals in disadvantaged groups have already suffered from discrimination and lost opportunities due to factors out of their control. In the present work we focus instead on interventions such as a new public policy, and in particular, how to maximize their positive effects while improving the fairness of the overall system. We use causal methods to model the effects of interventions, allowing for potential interference--each individual's outcome may depend on who else receives the intervention. We demonstrate this with an example of allocating a budget of teaching resources using a dataset of schools in New York City.


Efficient Differentiable Programming in a Functional Array-Processing Language

arXiv.org Machine Learning

EPFL, Switzerland We present a system for the automatic differentiation of a higher-order functional array-processing language. The core functional language underlying this system simultaneously supports both sourceto-source automatic differentiation and global optimizations such as loop transformations. Thanks to this feature, we demonstrate how for some real-world machine learning and computer vision benchmarks, the system outperforms the state-of-the-art automatic differentiation tools. This investigation led him to see the importance of functional arguments and recursive functions in the field of symbolic computation. From Norvig [38, p248]. 1 INTRODUCTION Functional programming (FP) and automatic differentiation (AD) have been natural partners for sixty years, and major functional languages all have elegant automatic differentiation packages [6, 17, 29]. With the increasing importance of numerical engineering disciplines such as machine learning, speech processing, and computer vision, there has never been a greater need for systems which mitigate the tedious and error-prone process of manual coding of derivatives. However the popular packages (TensorFlow, CNTK) all implement clunky (E)DSLs in procedural languages such as Python and C . One reason is that the FP packages are slower than their imperative counterparts, by many orders of magnitude [48], because modern applications depend heavily on array processing, with vectors, matrices, and tensors as the canonical datatypes. In contrast, AD for FP has generally handled only scalar workloads efficiently [29]. Our key contribution in this paper is to take a recently introduced F# subset designed for efficient compilation of array-processing workloads, and to augment it with vector AD primitives, yielding a functional AD tool that is competitive with the best C/C and Fortran tools on many benchmarks, and considerably faster on others.


Pattern Search Multidimensional Scaling

arXiv.org Machine Learning

We present a novel view of nonlinear manifold learning using derivative-free optimization techniques. Specifically, we propose an extension of the classical multi-dimensional scaling (MDS) method, where instead of performing gradient descent, we sample and evaluate possible "moves" in a sphere of fixed radius for each point in the embedded space. A fixed-point convergence guarantee can be shown by formulating the proposed algorithm as an instance of General Pattern Search (GPS) framework. Evaluation on both clean and noisy synthetic datasets shows that pattern search MDS can accurately infer the intrinsic geometry of manifolds embedded in high-dimensional spaces. Additionally, experiments on real data, even under noisy conditions, demonstrate that the proposed pattern search MDS yields state-of-the-art results.


Stochastic Zeroth-order Optimization via Variance Reduction method

arXiv.org Machine Learning

Derivative-free optimization has become an important technique used in machine learning for optimizing black-box models. To conduct updates without explicitly computing gradient, most current approaches iteratively sample a random search direction from Gaussian distribution and compute the estimated gradient along that direction. However, due to the variance in the search direction, the convergence rates and query complexities of existing methods suffer from a factor of $d$, where $d$ is the problem dimension. In this paper, we introduce a novel Stochastic Zeroth-order method with Variance Reduction under Gaussian smoothing (SZVR-G) and establish the complexity for optimizing non-convex problems. With variance reduction on both sample space and search space, the complexity of our algorithm is sublinear to $d$ and is strictly better than current approaches, in both smooth and non-smooth cases. Moreover, we extend the proposed method to the mini-batch version. Our experimental results demonstrate the superior performance of the proposed method over existing derivative-free optimization techniques. Furthermore, we successfully apply our method to conduct a universal black-box attack to deep neural networks and present some interesting results.