Goto

Collaborating Authors

Results


Are we Forgetting about Compositional Optimisers in Bayesian Optimisation?

arXiv.org Machine Learning

Bayesian optimisation presents a sample-efficient methodology for global optimisation. Within this framework, a crucial performance-determining subroutine is the maximisation of the acquisition function, a task complicated by the fact that acquisition functions tend to be non-convex and thus nontrivial to optimise. In this paper, we undertake a comprehensive empirical study of approaches to maximise the acquisition function. Additionally, by deriving novel, yet mathematically equivalent, compositional forms for popular acquisition functions, we recast the maximisation task as a compositional optimisation problem, allowing us to benefit from the extensive literature in this field. We highlight the empirical advantages of the compositional approach to acquisition function maximisation across 3958 individual experiments comprising synthetic optimisation tasks as well as tasks from Bayesmark. Given the generality of the acquisition function maximisation subroutine, we posit that the adoption of compositional optimisers has the potential to yield performance improvements across all domains in which Bayesian optimisation is currently being applied.


Quality-Diversity Optimization: a novel branch of stochastic optimization

arXiv.org Machine Learning

Traditional optimization algorithms search for a single global optimum that maximizes (or minimizes) the objective function. Multimodal optimization algorithms search for the highest peaks in the search space that can be more than one. Quality-Diversity algorithms are a recent addition to the evolutionary computation toolbox that do not only search for a single set of local optima, but instead try to illuminate the search space. In effect, they provide a holistic view of how high-performing solutions are distributed throughout a search space. The main differences with multimodal optimization algorithms are that (1) Quality-Diversity typically works in the behavioral space (or feature space), and not in the genotypic (or parameter) space, and (2) Quality-Diversity attempts to fill the whole behavior space, even if the niche is not a peak in the fitness landscape. In this chapter, we provide a gentle introduction to Quality-Diversity optimization, discuss the main representative algorithms, and the main current topics under consideration in the community. Throughout the chapter, we also discuss several successful applications of Quality-Diversity algorithms, including deep learning, robotics, and reinforcement learning.


Variational Quantum Algorithms

arXiv.org Machine Learning

Applications such as simulating large quantum systems or solving large-scale linear algebra problems are immensely challenging for classical computers due their extremely high computational cost. Quantum computers promise to unlock these applications, although fault-tolerant quantum computers will likely not be available for several years. Currently available quantum devices have serious constraints, including limited qubit numbers and noise processes that limit circuit depth. Variational Quantum Algorithms (VQAs), which employ a classical optimizer to train a parametrized quantum circuit, have emerged as a leading strategy to address these constraints. VQAs have now been proposed for essentially all applications that researchers have envisioned for quantum computers, and they appear to the best hope for obtaining quantum advantage. Nevertheless, challenges remain including the trainability, accuracy, and efficiency of VQAs. In this review article we present an overview of the field of VQAs. Furthermore, we discuss strategies to overcome their challenges as well as the exciting prospects for using them as a means to obtain quantum advantage.


At the Intersection of Deep Sequential Model Framework and State-space Model Framework: Study on Option Pricing

arXiv.org Machine Learning

Inference and forecast problems of the nonlinear dynamical system have arisen in a variety of contexts. Reservoir computing and deep sequential models, on the one hand, have demonstrated efficient, robust, and superior performance in modeling simple and chaotic dynamical systems. However, their innate deterministic feature has partially detracted their robustness to noisy system, and their inability to offer uncertainty measurement has also been an insufficiency of the framework. On the other hand, the traditional state-space model framework is robust to noise. It also carries measured uncertainty, forming a just-right complement to the reservoir computing and deep sequential model framework. We propose the unscented reservoir smoother, a model that unifies both deep sequential and state-space models to achieve both frameworks' superiorities. Evaluated in the option pricing setting on top of noisy datasets, URS strikes highly competitive forecasting accuracy, especially those of longer-term, and uncertainty measurement. Further extensions and implications on URS are also discussed to generalize a full integration of both frameworks.


Asymptotic study of stochastic adaptive algorithm in non-convex landscape

arXiv.org Machine Learning

This paper studies some asymptotic properties of adaptive algorithms widely used in optimization and machine learning, and among them Adagrad and Rmsprop, which are involved in most of the blackbox deep learning algorithms. Our setup is the non-convex landscape optimization point of view, we consider a one time scale parametrization and we consider the situation where these algorithms may be used or not with mini-batches. We adopt the point of view of stochastic algorithms and establish the almost sure convergence of these methods when using a decreasing step-size point of view towards the set of critical points of the target function. With a mild extra assumption on the noise, we also obtain the convergence towards the set of minimizer of the function. Along our study, we also obtain a "convergence rate" of the methods, in the vein of the works of \cite{GhadimiLan}.


Mercer Features for Efficient Combinatorial Bayesian Optimization

arXiv.org Artificial Intelligence

Bayesian optimization (BO) is an efficient framework for solving black-box optimization problems with expensive function evaluations. This paper addresses the BO problem setting for combinatorial spaces (e.g., sequences and graphs) that occurs naturally in science and engineering applications. A prototypical example is molecular optimization guided by expensive experiments. The key challenge is to balance the complexity of statistical models and tractability of search to select combinatorial structures for evaluation. In this paper, we propose an efficient approach referred as Mercer Features for Combinatorial Bayesian Optimization (MerCBO). The key idea behind MerCBO is to provide explicit feature maps for diffusion kernels over discrete objects by exploiting the structure of their combinatorial graph representation. These Mercer features combined with Thompson sampling as the acquisition function allows us to employ tractable solvers to find next structures for evaluation. Experiments on diverse real-world benchmarks demonstrate that MerCBO performs similarly or better than prior methods. The source code is available at https://github.com/aryandeshwal/MerCBO .


Optimizing Discrete Spaces via Expensive Evaluations: A Learning to Search Framework

arXiv.org Artificial Intelligence

We consider the problem of optimizing expensive black-box functions over discrete spaces (e.g., sets, sequences, graphs). The key challenge is to select a sequence of combinatorial structures to evaluate, in order to identify high-performing structures as quickly as possible. Our main contribution is to introduce and evaluate a new learning-to-search framework for this problem called L2S-DISCO. The key insight is to employ search procedures guided by control knowledge at each step to select the next structure and to improve the control knowledge as new function evaluations are observed. We provide a concrete instantiation of L2S-DISCO for local search procedure and empirically evaluate it on diverse real-world benchmarks. Results show the efficacy of L2S-DISCO over state-of-the-art algorithms in solving complex optimization problems.


Query-free Black-box Adversarial Attacks on Graphs

arXiv.org Artificial Intelligence

Many graph-based machine learning models are known to be vulnerable to adversarial attacks, where even limited perturbations on input data can result in dramatic performance deterioration. Most existing works focus on moderate settings in which the attacker is either aware of the model structure and parameters (white-box), or able to send queries to fetch model information. In this paper, we propose a query-free black-box adversarial attack on graphs, in which the attacker has no knowledge of the target model and no query access to the model. With the mere observation of the graph topology, the proposed attack strategy flips a limited number of links to mislead the graph models. We prove that the impact of the flipped links on the target model can be quantified by spectral changes, and thus be approximated using the eigenvalue perturbation theory. Accordingly, we model the proposed attack strategy as an optimization problem, and adopt a greedy algorithm to select the links to be flipped. Due to its simplicity and scalability, the proposed model is not only generic in various graph-based models, but can be easily extended when different knowledge levels are accessible as well. Extensive experiments demonstrate the effectiveness and efficiency of the proposed model on various downstream tasks, as well as several different graph-based learning models.


$k$-Variance: A Clustered Notion of Variance

arXiv.org Machine Learning

We introduce $k$-variance, a generalization of variance built on the machinery of random bipartite matchings. $K$-variance measures the expected cost of matching two sets of $k$ samples from a distribution to each other, capturing local rather than global information about a measure as $k$ increases; it is easily approximated stochastically using sampling and linear programming. In addition to defining $k$-variance and proving its basic properties, we provide in-depth analysis of this quantity in several key cases, including one-dimensional measures, clustered measures, and measures concentrated on low-dimensional subsets of $\mathbb R^n$. We conclude with experiments and open problems motivated by this new way to summarize distributional shape.


Better call Surrogates: A hybrid Evolutionary Algorithm for Hyperparameter optimization

arXiv.org Artificial Intelligence

In this paper, we propose a surrogate-assisted evolutionary algorithm (EA) for hyperparameter optimization of machine learning (ML) models. The proposed STEADE model initially estimates the objective function landscape using RadialBasis Function interpolation, and then transfers the knowledge to an EA technique called Differential Evolution that is used to evolve new solutions guided by a Bayesian optimization framework. We empirically evaluate our model on the hyperparameter optimization problems as a part of the black box optimization challenge at NeurIPS 2020 and demonstrate the improvement brought about by STEADE over the vanilla EA.