Optimization
GIBBON: General-purpose Information-Based Bayesian OptimisatioN
Moss, Henry B., Leslie, David S., Gonzalez, Javier, Rayson, Paul
This paper describes a general-purpose extension of max-value entropy search, a popular approach for Bayesian Optimisation (BO). A novel approximation is proposed for the information gain -- an information-theoretic quantity central to solving a range of BO problems, including noisy, multi-fidelity and batch optimisations across both continuous and highly-structured discrete spaces. Previously, these problems have been tackled separately within information-theoretic BO, each requiring a different sophisticated approximation scheme, except for batch BO, for which no computationally-lightweight information-theoretic approach has previously been proposed. GIBBON (General-purpose Information-Based Bayesian OptimisatioN) provides a single principled framework suitable for all the above, out-performing existing approaches whilst incurring substantially lower computational overheads. In addition, GIBBON does not require the problem's search space to be Euclidean and so is the first high-performance yet computationally light-weight acquisition function that supports batch BO over general highly structured input spaces like molecular search and gene design. Moreover, our principled derivation of GIBBON yields a natural interpretation of a popular batch BO heuristic based on determinantal point processes.
Optimised one-class classification performance
Lenz, Oliver Urs, Peralta, Daniel, Cornelis, Chris
We provide a thorough treatment of hyperparameter optimisation for three data descriptors with a good track-record in the literature: Support Vector Machine (SVM), Nearest Neighbour Distance (NND) and Average Localised Proximity (ALP). The hyperparameters of SVM have to be optimised through cross-validation, while NND and ALP allow the reuse of a single nearest-neighbour query and an efficient form of leave-one-out validation. We experimentally evaluate the effect of hyperparameter optimisation with 246 classification problems drawn from 50 datasets. From a selection of optimisation algorithms, the recent Malherbe-Powell proposal optimises the hyperparameters of all three data descriptors most efficiently. We calculate the increase in test AUROC and the amount of overfitting as a function of the number of hyperparameter evaluations. After 50 evaluations, ALP and SVM both significantly outperform NND. The performance of ALP and SVM is comparable, but ALP can be optimised more efficiently, while a choice between ALP and SVM based on validation AUROC gives the best overall result. This distils the many variables of one-class classification with hyperparameter optimisation down to a clear choice with a known trade-off, allowing practitioners to make informed decisions.
Evolutionary Multitask Optimization: a Methodological Overview, Challenges and Future Research Directions
Osaba, Eneko, Martinez, Aritz D., Del Ser, Javier
In this work we consider multitasking in the context of solving multiple optimization problems simultaneously by conducting a single search process. The principal goal when dealing with this scenario is to dynamically exploit the existing complementarities among the problems (tasks) being optimized, helping each other through the exchange of valuable knowledge. Additionally, the emerging paradigm of Evolutionary Multitasking tackles multitask optimization scenarios by using as inspiration concepts drawn from Evolutionary Computation. The main purpose of this survey is to collect, organize and critically examine the abundant literature published so far in Evolutionary Multitasking, with an emphasis on the methodological patterns followed when designing new algorithmic proposals in this area (namely, multifactorial optimization and multipopulation-based multitasking). We complement our critical analysis with an identification of challenges that remain open to date, along with promising research directions that can stimulate future efforts in this topic. Our discussions held throughout this manuscript are offered to the audience as a reference of the general trajectory followed by the community working in this field in recent times, as well as a self-contained entry point for newcomers and researchers interested to join this exciting research avenue.
Multimodal-Aware Weakly Supervised Metric Learning with Self-weighting Triplet Loss
Deng, Huiyuan, Meng, Xiangzhu, Feng, Lin
In recent years, we have witnessed a surge of interests in learning a suitable distance metric from weakly supervised data. Most existing methods aim to pull all the similar samples closer while push the dissimilar ones as far as possible. However, when some classes of the dataset exhibit multimodal distribution, these goals conflict and thus can hardly be concurrently satisfied. Additionally, to ensure a valid metric, many methods require a repeated eigenvalue decomposition process, which is expensive and numerically unstable. Therefore, how to learn an appropriate distance metric from weakly supervised data remains an open but challenging problem. To address this issue, in this paper, we propose a novel weakly supervised metric learning algorithm, named MultimoDal Aware weakly supervised Metric Learning (MDaML). MDaML partitions the data space into several clusters and allocates the local cluster centers and weight for each sample. Then, combining it with the weighted triplet loss can further enhance the local separability, which encourages the local dissimilar samples to keep a large distance from the local similar samples. Meanwhile, MDaML casts the metric learning problem into an unconstrained optimization on the SPD manifold, which can be efficiently solved by Riemannian Conjugate Gradient Descent (RCGD). Extensive experiments conducted on 13 datasets validate the superiority of the proposed MDaML.
The Instability of Accelerated Gradient Descent
Algorithmic stability has emerged over the last two decades as a central tool in generalization analysis of learning algorithms. While the classical approach in generalization theory originating in the PAC learning framework appeal to uniform convergence arguments, more recent progress on stochastic convex optimization models, starting with the pioneering work of Bousquet and Elisseeff (2002) and Shalev-Shwartz et al. (2009), has relied on stability analysis for deriving tight generalization results for convex risk minimizing algorithms. Perhaps the most common form of algorithmic stability is the so called uniform stability (Bousquet and Elisseeff, 2002). Roughly, the uniform stability of a learning algorithm is the worst-case change in its output model, in terms of its loss on an arbitrary example, when replacing a single sample in the data set used for training. Bousquet and Elisseeff (2002) initially used uniform stability to argue about the generalization of empirical risk minimization with strongly convex losses.
Frank-Wolfe with a Nearest Extreme Point Oracle
We consider variants of the classical Frank-Wolfe algorithm for constrained smooth convex minimization, that instead of access to the standard oracle for minimizing a linear function over the feasible set, have access to an oracle that can find an extreme point of the feasible set that is closest in Euclidean distance to a given vector. We first show that for many feasible sets of interest, such an oracle can be implemented with the same complexity as the standard linear optimization oracle. We then show that with such an oracle we can design new Frank-Wolfe variants which enjoy significantly improved complexity bounds in case the set of optimal solutions lies in the convex hull of a subset of extreme points with small diameter (e.g., a low-dimensional face of a polytope). In particular, for many $0\text{--}1$ polytopes, under quadratic growth and strict complementarity conditions, we obtain the first linearly convergent variant with rate that depends only on the dimension of the optimal face and not on the ambient dimension.
The Archerfish Hunting Optimizer: a novel metaheuristic algorithm for global optimization
Zitouni, Farouq, Harous, Saad, Belkeram, Abdelghani, Hammou, Lokman Elhakim Baba
Global optimization solves real-world problems numerically or analytically by minimizing their objective functions. Most of the analytical algorithms are greedy and computationally intractable. Metaheuristics are nature-inspired optimization algorithms. They numerically find a near-optimal solution for optimization problems in a reasonable amount of time. We propose a novel metaheuristic algorithm for global optimization. It is based on the shooting and jumping behaviors of the archerfish for hunting aerial insects. We name it the Archerfish Hunting Optimizer (AHO). We Perform two sorts of comparisons to validate the proposed algorithm's performance. First, AHO is compared to the 12 recent metaheuristic algorithms (the accepted algorithms for the 2020's competition on single objective bound-constrained numerical optimization) on ten test functions of the benchmark CEC 2020 for unconstrained optimization. Second, the performance of AHO and 3 recent metaheuristic algorithms, is evaluated using five engineering design problems taken from the benchmark CEC 2020 for non-convex constrained optimization. The experimental results are evaluated using the Wilcoxon signed-rank and the Friedman tests. The statistical indicators illustrate that the Archerfish Hunting Optimizer has an excellent ability to accomplish higher performance in competition with the well-established optimizers.
Continuous Wasserstein-2 Barycenter Estimation without Minimax Optimization
Korotin, Alexander, Li, Lingxiao, Solomon, Justin, Burnaev, Evgeny
Wasserstein barycenters provide a geometric notion of the weighted average of probability measures based on optimal transport. In this paper, we present a scalable algorithm to compute Wasserstein-2 barycenters given sample access to the input measures, which are not restricted to being discrete. While past approaches rely on entropic or quadratic regularization, we employ input convex neural networks and cycle-consistency regularization to avoid introducing bias. As a result, our approach does not resort to minimax optimization. We provide theoretical analysis on error bounds as well as empirical evidence of the effectiveness of the proposed approach in low-dimensional qualitative scenarios and high-dimensional quantitative experiments.
Ansatz-Independent Variational Quantum Classifier
Miyahara, Hideyuki, Roychowdhury, Vwani
The paradigm of variational quantum classifiers (VQCs) encodes \textit{classical information} as quantum states, followed by quantum processing and then measurements to generate classical predictions. VQCs are promising candidates for efficient utilization of a near-term quantum device: classifiers involving $M$-dimensional datasets can be implemented with only $\lceil \log_2 M \rceil$ qubits by using an amplitude encoding. A general framework for designing and training VQCs, however, has not been proposed, and a fundamental understanding of its power and analytical relationships with classical classifiers are not well understood. An encouraging specific embodiment of VQCs, quantum circuit learning (QCL), utilizes an ansatz: it expresses the quantum evolution operator as a circuit with a predetermined topology and parametrized gates; training involves learning the gate parameters through optimization. In this letter, we first address the open questions about VQCs and then show that they, including QCL, fit inside the well-known kernel method. Based on such correspondence, we devise a design framework of efficient ansatz-independent VQCs, which we call the unitary kernel method (UKM): it directly optimizes the unitary evolution operator in a VQC. Thus, we show that the performance of QCL is bounded from above by the UKM. Next, we propose a variational circuit realization (VCR) for designing efficient quantum circuits for a given unitary operator. By combining the UKM with the VCR, we establish an efficient framework for constructing high-performing circuits. We finally benchmark the relatively superior performance of the UKM and the VCR via extensive numerical simulations on multiple datasets.
Impact of Data Processing on Fairness in Supervised Learning
Khodadadian, Sajad, Ghassami, AmirEmad, Kiyavash, Negar
We study the impact of pre and post processing for reducing discrimination in data-driven decision makers. We first analyze the fundamental trade-off between fairness and accuracy in a pre-processing approach, and propose a design for a pre-processing module based on a convex optimization program, which can be added before the original classifier. This leads to a fundamental lower bound on attainable discrimination, given any acceptable distortion in the outcome. Furthermore, we reformulate an existing post-processing method in terms of our accuracy and fairness measures, which allows comparing post-processing and pre-processing approaches. We show that under some mild conditions, pre-processing outperforms post-processing. Finally, we show that by appropriate choice of the discrimination measure, the optimization problem for both pre and post processing approaches will reduce to a linear program and hence can be solved efficiently.