Goto

Collaborating Authors

 Optimization


Distributed Composite Quantization

AAAI Conferences

Approximate nearest neighbor (ANN) search is a fundamental problem in computer vision, machine learning and information retrieval. Recently, quantization-based methods have drawn a lot of attention due to their superior accuracy and comparable efficiency compared with traditional hashing techniques. However, despite the prosperity of quantization techniques, they are all designed for the centralized setting, i.e., quantization is performed on the data on a single machine. This makes it difficult to scale these techniques to large-scale datasets. Built upon the Composite Quantization, we propose a novel quantization algorithm for data dis- tributed across different nodes of an arbitrary network. The proposed Distributed Composite Quantization (DCQ) decom-poses Composite Quantization into a set of decentralized sub-problems such that each node solves its own sub-problem on its local data, meanwhile is still able to attain consistent quantizers thanks to the consensus constraint. Since there is no exchange of training data across the nodes in the learning process, the communication cost of our method is low. Ex- tensive experiments on ANN search and image retrieval tasks validate that the proposed DCQ significantly improves Composite Quantization in both efficiency and scale, while still maintaining competitive accuracy.


Algorithms for Trip-Vehicle Assignment in Ride-Sharing

AAAI Conferences

We investigate the ride-sharing assignment problem from an algorithmic resource allocation point of view. Given a number of requests with source and destination locations, and a number of available car locations, the task is to assign cars to requests with two requests sharing one car. We formulate this as a combinatorial optimization problem, and show that it is NP-hard. We then design an approximation algorithm which guarantees to output a solution with at most 2.5 times the optimal cost. Experiments are conducted showing that our algorithm actually has a much better approximation ratio (around 1.2) on synthetically generated data.


Optimization Methods for Large-Scale Machine Learning

arXiv.org Machine Learning

This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques that diminish noise in the stochastic directions and methods that make use of second-order derivative approximations.


Handbook of Research on Predictive Modeling and Optimization Methods in Science and Engineering

@machinelearnbot

The disciplines of science and engineering rely heavily on the forecasting of prospective constraints for concepts that have not yet been proven to exist, especially in areas such as artificial intelligence. Obtaining quality solutions to the problems presented becomes increasingly difficult due to the number of steps required to sift through the possible solutions, and the ability to solve such problems relies on the recognition of patterns and the categorization of data into specific sets. Predictive modeling and optimization methods allow unknown events to be categorized based on statistics and classifiers input by researchers. The Handbook of Research on Predictive Modeling and Optimization Methods in Science and Engineering is a critical reference source that provides comprehensive information on the use of optimization techniques and predictive models to solve real-life engineering and science problems. Through discussions on techniques such as robust design optimization, water level prediction, and the prediction of human actions, this publication identifies solutions to developing problems and new solutions for existing problems, making this publication a valuable resource for engineers, researchers, graduate students, and other professionals.


The Need, Promise, and Reality of Quantum Computing

@machinelearnbot

Despite giving us the most spectacular wave of technological innovation in human history, there are certain computational problems that the digital revolution still can't seem to solve. Some of these problems could be holding back key scientific breakthroughs, and even the global economy. Although conventional computers have been doubling in power and processing speed nearly ever two years for decades, they still don't seem to be getting any closer to solving these persistent problems. Want to know why? Ask any computer scientist, and they'll probably give you the same answer: today's digital, conventional computers are built on a classical, and very limited, model of computing. In the long run, to efficiently solve the world's most persistent computing problems, we're going to have to turn to an entirely new and more capable animal: the quantum computer.


Scalable Meta-Learning for Bayesian Optimization

arXiv.org Machine Learning

Bayesian optimization has become a standard technique for hyperparameter optimization, including data-intensive models such as deep neural networks that may take days or weeks to train. We consider the setting where previous optimization runs are available, and we wish to use their results to warm-start a new optimization run. We develop an ensemble model that can incorporate the results of past optimization runs, while avoiding the poor scaling that comes with putting all results into a single Gaussian process model. The ensemble combines models from past runs according to estimates of their generalization performance on the current optimization. Results from a large collection of hyperparameter optimization benchmark problems and from optimization of a production computer vision platform at Facebook show that the ensemble can substantially reduce the time it takes to obtain near-optimal configurations, and is useful for warm-starting expensive searches or running quick re-optimizations.


ZOOpt: Toolbox for Derivative-Free Optimization

arXiv.org Machine Learning

Recent advances of derivative-free optimization allow efficient approximating the global optimal solutions of sophisticated functions, such as functions with many local optima, non-differentiable and non-continuous functions. This article describes the ZOOpt (https://github.com/eyounx/ZOOpt) toolbox that provides efficient derivative-free solvers and are designed easy to use. ZOOpt provides a Python package for single-thread optimization, and a light-weighted distributed version with the help of the Julia language for Python described functions. ZOOpt toolbox particularly focuses on optimization problems in machine learning, addressing high-dimensional, noisy, and large-scale problems. The toolbox is being maintained toward ready-to-use tool in real-world machine learning tasks.


Bayesian Optimization with Gradients

arXiv.org Artificial Intelligence

Bayesian optimization has been successful at global optimization of expensive-to-evaluate multimodal objective functions. However, unlike most optimization methods, Bayesian optimization typically does not use derivative information. In this paper we show how Bayesian optimization can exploit derivative information to decrease the number of objective function evaluations required for good performance. In particular, we develop a novel Bayesian optimization algorithm, the derivative-enabled knowledge-gradient (dKG), for which we show one-step Bayes-optimality, asymptotic consistency, and greater one-step value of information than is possible in the derivative-free setting. Our procedure accommodates noisy and incomplete derivative information, comes in both sequential and batch forms, and can optionally reduce the computational cost of inference through automatically selected retention of a single directional derivative. We also compute the d-KG acquisition function and its gradient using a novel fast discretization-free technique. We show d-KG provides state-of-the-art performance compared to a wide range of optimization procedures with and without gradients, on benchmarks including logistic regression, deep learning, kernel learning, and k-nearest neighbors.


Bayesian Coreset Construction via Greedy Iterative Geodesic Ascent

arXiv.org Machine Learning

Coherent uncertainty quantification is a key strength of Bayesian methods. But modern algorithms for approximate Bayesian posterior inference often sacrifice accurate posterior uncertainty estimation in the pursuit of scalability. This work shows that previous Bayesian coreset construction algorithms---which build a small, weighted subset of the data that approximates the full dataset---are no exception. We demonstrate that these algorithms scale the coreset log-likelihood suboptimally, resulting in underestimated posterior uncertainty. To address this shortcoming, we develop greedy iterative geodesic ascent (GIGA), a novel algorithm for Bayesian coreset construction that scales the coreset log-likelihood optimally. GIGA provides geometric decay in posterior approximation error as a function of coreset size, and maintains the fast running time of its predecessors. The paper concludes with validation of GIGA on both synthetic and real datasets, demonstrating that it reduces posterior approximation error by orders of magnitude compared with previous coreset constructions.


A Bridge Between Hyperparameter Optimization and Larning-to-learn

arXiv.org Machine Learning

We consider a class of a nested optimization problems involving inner and outer objectives. We observe that by taking into explicit account the optimization dynamics for the inner objective it is possible to derive a general framework that unifies gradient-based hyperparameter optimization and meta-learning (or learning-to-learn). Depending on the specific setting, the variables of the outer objective take either the meaning of hyperparameters in a supervised learning problem or parameters of a meta-learner. We show that some recently proposed methods in the latter setting can be instantiated in our framework and tackled with the same gradient-based algorithms. Finally, we discuss possible design patterns for learning-to-learn and present encouraging preliminary experiments for few-shot learning.