Goto

Collaborating Authors

 Optimization


A Mean-Field Theory for Kernel Alignment with Random Features in Generative Adversarial Networks

arXiv.org Machine Learning

We propose a novel supervised learning method to optimize the kernel in maximum mean discrepancy generative adversarial networks (MMD GANs). Specifically, we characterize a distributionally robust optimization problem to compute a good distribution for the random feature model of Rahimi and Recht to approximate a good kernel function. Due to the fact that the distributional optimization is infinite dimensional, we consider a Monte-Carlo sample average approximation (SAA) to obtain a more tractable finite dimensional optimization problem. We subsequently leverage a particle stochastic gradient descent (SGD) method to solve finite dimensional optimization problems. Based on a mean-field analysis, we then prove that the empirical distribution of the interactive particles system at each iteration of the SGD follows the path of the gradient descent flow on the Wasserstein manifold. We also establish the non-asymptotic consistency of the finite sample estimator. Our empirical evaluation on synthetic data-set as well as MNIST and CIFAR-10 benchmark data-sets indicates that our proposed MMD GAN model with kernel learning indeed attains higher inception scores well as Fr\`{e}chet inception distances and generates better images compared to the generative moment matching network (GMMN) and MMD GAN with untrained kernels.


Online Non-Monotone DR-submodular Maximization

arXiv.org Machine Learning

In this paper, we study problems at the interface of two important fields: \emph{submodular optimization} and \emph{online learning}. Submodular functions play a vital role in modelling cost functions that naturally arise in many areas of discrete optimization. These functions have been studied under various models of computation. Independently, submodularity has been considered in continuous domains. In fact, many problems arising in machine learning and statistics have been modelled using continuous DR-submodular functions. In this work, we are study the problem of maximizing \textit{non-monotone} continuous DR-submodular functions within the framework of online learning. We provide three main results. First, we present an online algorithm (in full-information setting) that achieves an approximation guarantee (depending on the search space) for the problem of maximizing non-monotone continuous DR-submodular functions over a \emph{general} convex domain. To best of our knowledge, no prior approximation algorithm in full-information setting was known for the non-monotone continuous DR submodular functions even for the \emph{down-closed} convex domain. Second, we show that the online stochastic mirror ascent algorithm (in full information setting) achieves an improved approximation ratio of $(1/4)$ for maximizing the non-monotone continuous DR-submodular functions over a \emph{down-closed} convex domain. At last, we extend our second result to the bandit setting where we present the first approximation guarantee of $(1/4)$. To best of our knowledge, no approximation algorithm for non-monotone submodular maximization was known in the bandit setting.


Structured Graph Learning Via Laplacian Spectral Constraints

arXiv.org Machine Learning

Learning a graph with a specific structure is essential for interpretability and identification of the relationships among data. It is well known that structured graph learning from observed samples is an NP-hard combinatorial problem. In this paper, we first show that for a set of important graph families it is possible to convert the structural constraints of structure into eigenvalue constraints of the graph Laplacian matrix. Then we introduce a unified graph learning framework, lying at the integration of the spectral properties of the Laplacian matrix with Gaussian graphical modeling that is capable of learning structures of a large class of graph families. The proposed algorithms are provably convergent and practically amenable for large-scale semi-supervised and unsupervised graph-based learning tasks. Extensive numerical experiments with both synthetic and real data sets demonstrate the effectiveness of the proposed methods. An R package containing code for all the experimental results is available at https://cran.r-project.org/package=spectralGraphTopology.


Sign-OPT: A Query-Efficient Hard-label Adversarial Attack

arXiv.org Machine Learning

We study the most practical problem setup for evaluating adversarial robustness of a machine learning system with limited access: the hard-label black-box attack setting for generating adversarial examples, where limited model queries are allowed and only the decision is provided to a queried data input. Several algorithms have been proposed for this problem but they typically require huge amount (>20,000) of queries for attacking one example. Among them, one of the state-of-the-art approaches (Cheng et al., 2019) showed that hard-label attack can be modeled as an optimization problem where the objective function can be evaluated by binary search with additional model queries, thereby a zeroth order optimization algorithm can be applied. In this paper, we adopt the same optimization formulation but propose to directly estimate the sign of gradient at any direction instead of the gradient itself, which enjoys the benefit of single query. Using this single query oracle for retrieving sign of directional derivative, we develop a novel query-efficient Sign-OPT approach for hard-label black-box attack. We provide a convergence analysis of the new algorithm and conduct experiments on several models on MNIST, CIFAR-10 and ImageNet. We find that Sign-OPT attack consistently requires 5X to 10X fewer queries when compared to the current state-of-the-art approaches, and usually converges to an adversarial example with smaller perturbation.


Conditions for Unnecessary Logical Constraints in Kernel Machines

arXiv.org Artificial Intelligence

A main property of support vector machines consists in the fact that only a small portion of the training data is significant to determine the maximum margin separating hyperplane in the feature space, the so called support vectors . In a similar way, in the general scheme of learning from constraints, where possibly several constraints are considered, some of them may turn out to be unnecessary with respect to the learning optimization, even if they are active for a given optimal solution. In this paper we extend the definition of support vector to support constraint and we provide some criteria to determine which constraints can be removed from the learning problem still yielding the same optimal solutions. In particular, we discuss the case of logical constraints expressed by null Lukasiewicz logic, where both inferential and algebraic arguments can be considered. Some theoretical results that characterize the concept of unnecessary constraint are proved and explained by means of examples.


Machine Learning Optimization Algorithms & Portfolio Allocation

arXiv.org Machine Learning

Portfolio optimization emerged with the seminal paper of Markowitz (1952). The original mean-variance framework is appealing because it is very efficient from a computational point of view. However, it also has one well-established failing since it can lead to portfolios that are not optimal from a financial point of view. Nevertheless, very few models have succeeded in providing a real alternative solution to the Markowitz model. The main reason lies in the fact that most academic portfolio optimization models are intractable in real life although they present solid theoretical properties. By intractable we mean that they can be implemented for an investment universe with a small number of assets using a lot of computational resources and skills, but they are unable to manage a universe with dozens or hundreds of assets. However, the emergence and the rapid development of robo-advisors means that we need to rethink portfolio optimization and go beyond the traditional mean-variance optimization approach. Another industry has faced similar issues concerning large-scale optimization problems. Machine learning has long been associated with linear and logistic regression models. Again, the reason was the inability of optimization algorithms to solve high-dimensional industrial problems. Nevertheless, the end of the 1990s marked an important turning point with the development and the rediscovery of several methods that have since produced impressive results. The goal of this paper is to show how portfolio allocation can benefit from the development of these large-scale optimization algorithms. Not all of these algorithms are useful in our case, but four of them are essential when solving complex portfolio optimization problems. These four algorithms are the coordinate descent, the alternating direction method of multipliers, the proximal gradient method and the Dykstra's algorithm.


Trimmed Constrained Mixed Effects Models: Formulations and Algorithms

arXiv.org Machine Learning

Mixed effects (ME) models inform a vast array of problems in the physical and social sciences, and are pervasive in meta-analysis. We consider ME models where the random effects component is linear. We then develop an efficient approach for a broad problem class that allows nonlinear measurements, priors, and constraints, and finds robust estimates in all of these cases using trimming in the associated marginal likelihood. We illustrate the efficacy of the approach on a range of applications for meta-analysis of global health data. Constraints and priors are used to impose monotonicity, convexity and other characteristics on dose-response relationships, while nonlinear observations enable new epidemiological analyses in place of approximations. Robust extensions ensure that spurious studies do not drive our understanding of between-study heterogeneity. The software accompanying this paper is disseminated as an open-source Python package called limeTR.


Scalable Kernel Learning via the Discriminant Information

arXiv.org Machine Learning

For commonly used kernels such as Gaussian, the gradient computations mainly consist of matrix products and linear system solutions, thus they can be sped up significantly with GPU-accelerated linear system solvers. For instance, our imple - mentation took less than 80 miliseconds to compute DI/KDI gradients on an nVidia P100 GPU with feature dimensionalities up to 2000 and batch sizes up to 4000 using Gaussian kernels on the 3 datasets considered. In common learning methodologies, where a linear predictor is trained in conjunction with a parametric non-line ar mapping, the overall objective is to minimize a loss functio n averaged over the entire training sample, i.e., to minimize the expected loss over a single empirical distribution. Since D I directly measures the loss of the best linear predictor on a batch, however, stochastic gradient methods have a differe nt interpretation when utilizing this objective. Since each m ini-batch represents a different empirical distribution, DI ba sed training instead aims to find a feature mapping that adapts to various empirical distributions, which can reduce overfitt ing analogous to how bagging can improve generalization [ 27 ].


A Time-Dependent TSP Formulation for the Design of an Active Debris Removal Mission using Simulated Annealing

arXiv.org Artificial Intelligence

This paper proposes a formulation of the Active Debris Removal (ADR) Mission Design problem as a modified Time-Dependent Traveling Salesman Problem (TDTSP). The TDTSP is a well-known combinatorial optimization problem, whose solution is the cheapest mono-cyclic tour connecting a number of non-stationary cities in a map. The problem is tackled with an optimization procedure based on Simulated Annealing, that efficiently exploits a natural encoding and a careful choice of mutation operators. The developed algorithm is used to simultaneously optimize the targets sequence and the rendezvous epochs of an impulsive ADR mission. Numerical results are presented for sets comprising up to 20 targets. INTRODUCTION The Traveling Salesman Problem (TSP) is a well-known combinatorial optimization problem, whose solution is the cheapest tour which allows a salesman to visit, only once, a number of cities in a map; the cost of each city-to-city transfer is, typically, the traveled distance or the fuel consumption. Active Debris Removal (ADR) missions can be seen as peculiar instances of the TDTSP, where an active (chaser) spacecraft is asked to visit, that is, to perform a rendezvous, with a certain number of targets (space debris), making the best use of the on-board propellant. Such kind of missions are increasing in popularity among space agencies all over the world, as the sustainability of the extra-atmospheric environment is becoming compromised by the huge amount of "space garbage" now orbiting Earth. A cost-competitive space program would involve the removal of several dozens of small debris with each single mission; such a complex scenario could became feasible only with the best possible use of the propellant on-board of the chaser spacecraft. As a consequence, a well-designed ADR mission would require the optimization of a multi-target rendezvous trajectory. A number of authors dealt with long term or time-free ADR missions aimed at removing a small number of debris from Sun synchronous orbits (at a rate of three to ten per year). These missions heavily rely on J 2 orbital perturbation for the alignment of the orbital planes of consecutive targets before starting the rendezvous maneuver, in order to reduce the mission cost.


Faster saddle-point optimization for solving large-scale Markov decision processes

arXiv.org Machine Learning

We consider the problem of computing optimal policies in average-r eward Markov decision processes. This classical problem can be formulated as a linear program dire ctly amenable to saddle-point optimization methods, albeit with a number of variables that is linear in the n umber of states. T o address this issue, recent work has considered a linearly relaxed version of the res ulting saddle-point problem. Our work aims at achieving a better understanding of this relaxed optimization pro blem by characterizing the conditions necessary for convergence to the optimal policy, and designing a n optimization algorithm enjoying fast convergence rates that are independent of the size of the state s pace.