Goto

Collaborating Authors

 Optimization


Local Asymptotics for Stochastic Optimization: Optimality, Constraint Identification, and Dual Averaging

arXiv.org Machine Learning

We study local complexity measures for stochastic convex optimization problems, providing a local minimax theory analogous to that of H\'{a}jek and Le Cam for classical statistical problems, and giving efficient procedures based on Nesterov's dual averaging that (often) adaptively achieve optimal convergence guarantees. Our results provide function-specific lower bounds and convergence results that make precise a correspondence between statistical difficulty and the geometric notion of tilt-stability from optimization. We show how variants of dual averaging---a stochastic gradient-based procedure---guarantee finite time identification of constraints in optimization problems, while stochastic gradient procedures provably fail. Additionally, we highlight a gap between optimization problems with linear and nonlinear constraints: standard stochastic-gradient-based procedures are suboptimal even for the simplest nonlinear constraints.


On Tensor Train Rank Minimization: Statistical Efficiency and Scalable Algorithm

arXiv.org Machine Learning

Tensor train (TT) decomposition provides a space-efficient representation for higher-order tensors. Despite its advantage, we face two crucial limitations when we apply the TT decomposition to machine learning problems: the lack of statistical theory and of scalable algorithms. In this paper, we address the limitations. First, we introduce a convex relaxation of the TT decomposition problem and derive its error bound for the tensor completion task. Next, we develop an alternating optimization method with a randomization technique, in which the time complexity is as efficient as the space complexity is. In experiments, we numerically confirm the derived bounds and empirically demonstrate the performance of our method with a real higher-order tensor.


Reexamining Low Rank Matrix Factorization for Trace Norm Regularization

arXiv.org Machine Learning

Trace norm regularization is a widely used approach for learning low rank matrices. A standard optimization strategy is based on formulating the problem as one of low rank matrix factorization which, however, leads to a non-convex problem. In practice this approach works well, and it is often computationally faster than standard convex solvers such as proximal gradient methods. Nevertheless, it is not guaranteed to converge to a global optimum, and the optimization can be trapped at poor stationary points. In this paper we show that it is possible to characterize all critical points of the non-convex problem. This allows us to provide an efficient criterion to determine whether a critical point is also a global minimizer. Our analysis suggests an iterative meta-algorithm that dynamically expands the parameter space and allows the optimization to escape any non-global critical point, thereby converging to a global minimizer. The algorithm can be applied to problems such as matrix completion or multitask learning, and our analysis holds for any random initialization of the factor matrices. Finally, we confirm the good performance of the algorithm on synthetic and real datasets.


Data-Efficient Exploration, Optimization, and Modeling of Diverse Designs through Surrogate-Assisted Illumination

arXiv.org Machine Learning

The MAP-Elites algorithm produces a set of high-performing solutions that vary according to features defined by the user. This technique has the potential to be a powerful tool for design space exploration, but is limited by the need for numerous evaluations. The Surrogate-Assisted Illumination algorithm (SAIL), introduced here, integrates approximative models and intelligent sampling of the objective function to minimize the number of evaluations required by MAP-Elites. The ability of SAIL to efficiently produce both accurate models and diverse high performing solutions is illustrated on a 2D airfoil design problem. The search space is divided into bins, each holding a design with a different combination of features. In each bin SAIL produces a better performing solution than MAP-Elites, and requires several orders of magnitude fewer evaluations. The CMA-ES algorithm was used to produce an optimal design in each bin: with the same number of evaluations required by CMA-ES to find a near-optimal solution in a single bin, SAIL finds solutions of similar quality in every bin.


Efficient Regret Minimization in Non-Convex Games

arXiv.org Machine Learning

We consider regret minimization in repeated games with non-convex loss functions. Minimizing the standard notion of regret is computationally intractable. Thus, we define a natural notion of regret which permits efficient optimization and generalizes offline guarantees for convergence to an approximate local optimum. We give gradient-based methods that achieve optimal regret, which in turn guarantee convergence to equilibrium in this framework.


Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints

arXiv.org Machine Learning

Language is increasingly being used to define rich visual recognition problems with supporting image collections sourced from the web. Structured prediction models are used in these tasks to take advantage of correlations between co-occurring labels and visual input but risk inadvertently encoding social biases found in web corpora. In this work, we study data and models associated with multilabel object classification and visual semantic role labeling. We find that (a) datasets for these tasks contain significant gender bias and (b) models trained on these datasets further amplify existing bias. For example, the activity cooking is over 33% more likely to involve females than males in a training set, and a trained model further amplifies the disparity to 68% at test time. We propose to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for collective inference. Our method results in almost no performance loss for the underlying recognition task but decreases the magnitude of bias amplification by 47.5% and 40.5% for multilabel classification and visual semantic role labeling, respectively.


Efficient Algorithms for Non-convex Isotonic Regression through Submodular Optimization

arXiv.org Machine Learning

We consider the minimization of submodular functions subject to ordering constraints. We show that this optimization problem can be cast as a convex optimization problem on a space of uni-dimensional measures, with ordering constraints corresponding to first-order stochastic dominance. We propose new discretization schemes that lead to simple and efficient algorithms based on zero-th, first, or higher order oracles; these algorithms also lead to improvements without isotonic constraints. Finally, our experiments show that non-convex loss functions can be much more robust to outliers for isotonic regression, while still leading to an efficient optimization problem.


Adaptive Simulation-based Training of AI Decision-makers using Bayesian Optimization

arXiv.org Machine Learning

This work studies how an AI-controlled dog-fighting agent with tunable decision-making parameters can learn to optimize performance against an intelligent adversary, as measured by a stochastic objective function evaluated on simulated combat engagements. Gaussian process Bayesian optimization (GPBO) techniques are developed to automatically learn global Gaussian Process (GP) surrogate models, which provide statistical performance predictions in both explored and unexplored areas of the parameter space. This allows a learning engine to sample full-combat simulations at parameter values that are most likely to optimize performance and also provide highly informative data points for improving future predictions. However, standard GPBO methods do not provide a reliable surrogate model for the highly volatile objective functions found in aerial combat, and thus do not reliably identify global maxima. These issues are addressed by novel Repeat Sampling (RS) and Hybrid Repeat/Multi-point Sampling (HRMS) techniques. Simulation studies show that HRMS improves the accuracy of GP surrogate models, allowing AI decision-makers to more accurately predict performance and efficiently tune parameters.


Developing quantum algorithms for optimization problems

#artificialintelligence

For example, they can factor large numbers exponentially faster than classical computers, which would allow them to break codes in the most commonly used cryptography system. There are other potential applications for quantum computers, too, such as solving complicated chemistry problems involving the mechanics of molecules. But exactly what types of applications will be best for quantum computers, which still may be a decade or more away from becoming a reality, is still an open question. In a new Caltech study, accepted by the Institute of Electrical and Electronics Engineers (IEEE) 2017 Symposium on Foundations of Computer Science, researchers have demonstrated that quantum computing could be useful for speeding up the solutions to "semidefinite programs," a widely used class of optimization problems. These programs include so-called linear programs, which are used, for example, when a company wants to minimize the risk of its investment portfolio or when an airline wants to efficiently assign crews to its flights.


A supermartingale approach to Gaussian process based sequential design of experiments

arXiv.org Machine Learning

Gaussian process (GP) models have become a well-established frameworkfor the adaptive design of costly experiments, and notably of computerexperiments. GP-based sequential designs have been found practicallyefficient for various objectives, such as global optimization(estimating the global maximum or maximizer(s) of a function),reliability analysis (estimating a probability of failure) or theestimation of level sets and excursion sets. In this paper, we dealwith convergence properties of an important class of sequential designapproaches, known as stepwise uncertainty reduction (SUR) strategies.Our approach relies on the key observation that the sequence ofresidual uncertainty measures, in SUR strategies, is generally asupermartingale with respect to the filtration generated by theobservations. We study the existence of SUR strategies and establishgeneric convergence results for a broad class thereof. We alsointroduce a special class of uncertainty measures defined in terms ofregular loss functions, which makes it easier to check that ourconvergence results apply in particular cases. Applications of thelatter include proofs of convergence for the two main SUR strategiesproposed by Bect, Ginsbourger, Li, Picheny and Vazquez (Stat. Comp.,2012). To the best of our knowledge, these are the first convergenceproofs for GP-based sequential design algorithms dedicated to theestimation of excursions sets and their measure. Coming to globaloptimization algorithms, we also show that the knowledge gradientstrategy can be cast in the SUR framework with an uncertaintyfunctional stemming from a regular loss, resulting in furtherconvergence results. We finally establish a new proof of convergencefor the expected improvement algorithm, which is the first proof forthis algorithm that applies to any GP with continuous sample paths.