Goto

Collaborating Authors

 Optimization


Mirror descent in saddle-point problems: Going the extra (gradient) mile

arXiv.org Machine Learning

Owing to their connection with generative adversarial networks (GANs), saddle-point problems have recently attracted considerable interest in machine learning and beyond. By necessity, most theoretical guarantees revolve around convex-concave problems; however, making theoretical inroads towards efficient GAN training crucially depends on moving beyond this classic framework. To make piecemeal progress along these lines, we analyze the widely used mirror descent (MD) method in a class of non-monotone problems - called coherent - whose solutions coincide with those of a naturally associated variational inequality. Our first result is that, under strict coherence (a condition satisfied by all strictly convex-concave problems), MD methods converge globally; however, they may fail to converge even in simple, bilinear models. To mitigate this deficiency, we add on an "extra-gradient" step which we show stabilizes MD methods by looking ahead and using a "future gradient". These theoretical results are subsequently validated by numerical experiments in GANs.


New Optimization Algorithm Exponentially Speeds Computation - IEEE Spectrum

#artificialintelligence

A new algorithm could dramatically slash the time it can take computers to recommend movies or route taxis. The new algorithm developed by Harvard University researchers solves optimization problems exponentially faster than previous algorithms by cutting the number of steps required. Surprisingly, this approach works "without sacrificing the quality of the resulting solution," says study senior author Yaron Singer at Harvard University. Optimization problems seek to find the best answer from all possible solutions, such as mapping the fastest route from point A to point B. Many algorithms designed to solve optimization problems have not changed since they were first described in the 1970s. Previous optimization algorithms generally worked in a step-by-step process, with the number of steps proportional to the amount of the data analyzed.


Machine learning prowess on display

#artificialintelligence

More than 80 Amazon scientists and engineers will attend this year's International Conference on Machine Learning (ICML) in Stockholm, Sweden, with 11 papers co-authored by Amazonians being presented. "ICML is one of the leading outlets for machine learning research," says Neil Lawrence, director of machine learning for Amazon's Supply Chain Optimization Technologies program. "It's a great opportunity to find out what other researchers have been up to and share some of our own learnings." At ICML, members of Lawrence's team will present a paper titled "Structured Variationally Auto-encoded Optimization," which describes a machine-learning approach to optimization, or choosing the values for variables in some process that maximize a particular outcome. The first author on the paper is Xiaoyu Lu, a graduate student at the University of Oxford who worked on the project as an intern at Amazon last summer, then returned in January to do some follow-up work.


Modeling outcomes of soccer matches

arXiv.org Machine Learning

We compare various extensions of the Bradley-Terry model and a hierarchical Poisson log-linear model in terms of their performance in predicting the outcome of soccer matches (win, draw, or loss). The parameters of the Bradley-Terry extensions are estimated by maximizing the log-likelihood, or an appropriately penalized version of it, while the posterior densities of the parameters of the hierarchical Poisson log-linear model are approximated using integrated nested Laplace approximations. The prediction performance of the various modeling approaches is assessed using a novel, context-specific framework for temporal validation that is found to deliver accurate estimates of the test error. The direct modeling of outcomes via the various Bradley-Terry extensions and the modeling of match scores using the hierarchical Poisson log-linear model demonstrate similar behavior in terms of predictive performance.


Quasi-Monte Carlo Variational Inference

arXiv.org Machine Learning

Many machine learning problems involve Monte Carlo gradient estimators. As a prominent example, we focus on Monte Carlo variational inference (MCVI) in this paper. The performance of MCVI crucially depends on the variance of its stochastic gradients. We propose variance reduction by means of Quasi-Monte Carlo (QMC) sampling. QMC replaces N i.i.d. samples from a uniform probability distribution by a deterministic sequence of samples of length N. This sequence covers the underlying random variable space more evenly than i.i.d. draws, reducing the variance of the gradient estimator. With our novel approach, both the score function and the reparameterization gradient estimators lead to much faster convergence. We also propose a new algorithm for Monte Carlo objectives, where we operate with a constant learning rate and increase the number of QMC samples per iteration. We prove that this way, our algorithm can converge asymptotically at a faster rate than SGD. We furthermore provide theoretical guarantees on QMC for Monte Carlo objectives that go beyond MCVI, and support our findings by several experiments on large-scale data sets from various domains.


Dynamic Control of Explore/Exploit Trade-Off In Bayesian Optimization

arXiv.org Machine Learning

Bayesian optimization offers the possibility of optimizing black-box operations not accessible through traditional techniques. The success of Bayesian optimization methods such as Expected Improvement (EI) are significantly affected by the degree of trade-off between exploration and exploitation. Too much exploration can lead to inefficient optimization protocols, whilst too much exploitation leaves the protocol open to strong initial biases, and a high chance of getting stuck in a local minimum. Typically, a constant margin is used to control this trade-off, which results in yet another hyper-parameter to be optimized. We propose contextual improvement as a simple, yet effective heuristic to counter this - achieving a one-shot optimization strategy. Our proposed heuristic can be swiftly calculated and improves both the speed and robustness of discovery of optimal solutions. We demonstrate its effectiveness on both synthetic and real world problems and explore the unaccounted for uncertainty in the pre-determination of search hyperparameters controlling explore-exploit trade-off.


A First Analysis of Kernels for Kriging-based Optimization in Hierarchical Search Spaces

arXiv.org Machine Learning

Many real-world optimization problems require significant resources for objective function evaluations. This is a challenge to evolutionary algorithms, as it limits the number of available evaluations. One solution are surrogate models, which replace the expensive objective. A particular issue in this context are hierarchical variables. Hierarchical variables only influence the objective function if other variables satisfy some condition. We study how this kind of hierarchical structure can be integrated into the model based optimization framework. We discuss an existing kernel and propose alternatives. An artificial test function is used to investigate how different kernels and assumptions affect model quality and search performance.


Bilevel Programming for Hyperparameter Optimization and Meta-Learning

arXiv.org Machine Learning

We introduce a framework based on bilevel programming that unifies gradient-based hyperparameter optimization and meta-learning. We show that an approximate version of the bilevel problem can be solved by taking into explicit account the optimization dynamics for the inner objective. Depending on the specific setting, the outer variables take either the meaning of hyperparameters in a supervised learning problem or parameters of a meta-learner. We provide sufficient conditions under which solutions of the approximate problem converge to those of the exact problem. We instantiate our approach for meta-learning in the case of deep learning where representation layers are treated as hyperparameters shared across a set of training episodes. In experiments, we confirm our theoretical findings, present encouraging results for few-shot learning and contrast the bilevel approach against classical approaches for learning-to-learn.


Modeling Sparse Deviations for Compressed Sensing using Generative Models

arXiv.org Machine Learning

In compressed sensing, a small number of linear measurements can be used to reconstruct an unknown signal. Existing approaches leverage assumptions on the structure of these signals, such as sparsity or the availability of a generative model. A domain-specific generative model can provide a stronger prior and thus allow for recovery with far fewer measurements. However, unlike sparsity-based approaches, existing methods based on generative models guarantee exact recovery only over their support, which is typically only a small subset of the space on which the signals are defined. We propose Sparse-Gen, a framework that allows for sparse deviations from the support set, thereby achieving the best of both worlds by using a domain specific prior and allowing reconstruction over the full space of signals. Theoretically, our framework provides a new class of signals that can be acquired using compressed sensing, reducing classic sparse vector recovery to a special case and avoiding the restrictive support due to a generative model prior. Empirically, we observe consistent improvements in reconstruction accuracy over competing approaches, especially in the more practical setting of transfer compressed sensing where a generative model for a data-rich, source domain aids sensing on a data-scarce, target domain.


Travel Time Optimization With Machine Learning And Genetic Algorithm

#artificialintelligence

What is the relationship between machine learning and optimization? On the other hand, what happens when machine learning is used to solve optimization problems? Consider this: a UPS driver with 25 packages has 15 trillion possible routes to choose from. And if each driver drives just one more mile each day than necessary, the company would be losing $30 million a year. While UPS would have all the data for their trucks and routes, there is no way they can run 15 trillion computations per each driver with 25 packages.