Optimization
Multifactorial Cellular Genetic Algorithm (MFCGA): Algorithmic Design, Performance Comparison and Genetic Transferability Analysis
Osaba, Eneko, Martinez, Aritz D., Lobo, Jesus L., Del Ser, Javier, Herrera, Francisco
Multitasking optimization is an incipient research area which is lately gaining a notable research momentum. Unlike traditional optimization paradigm that focuses on solving a single task at a time, multitasking addresses how multiple optimization problems can be tackled simultaneously by performing a single search process. The main objective to achieve this goal efficiently is to exploit synergies between the problems (tasks) to be optimized, helping each other via knowledge transfer (thereby being referred to as Transfer Optimization). Furthermore, the equally recent concept of Evolutionary Multitasking (EM) refers to multitasking environments adopting concepts from Evolutionary Computation as their inspiration for the simultaneous solving of the problems under consideration. As such, EM approaches such as the Multifactorial Evolutionary Algorithm (MFEA) has shown a remarkable success when dealing with multiple discrete, continuous, single-, and/or multi-objective optimization problems. In this work we propose a novel algorithmic scheme for Multifactorial Optimization scenarios - the Multifactorial Cellular Genetic Algorithm (MFCGA) - that hinges on concepts from Cellular Automata to implement mechanisms for exchanging knowledge among problems. We conduct an extensive performance analysis of the proposed MFCGA and compare it to the canonical MFEA under the same algorithmic conditions and over 15 different multitasking setups (encompassing different reference instances of the discrete Traveling Salesman Problem). A further contribution of this analysis beyond performance benchmarking is a quantitative examination of the genetic transferability among the problem instances, eliciting an empirical demonstration of the synergies emerged between the different optimization tasks along the MFCGA search process.
Born-Again Tree Ensembles
Vidal, Thibaut, Pacheco, Toni, Schiffer, Maximilian
The use of machine learning algorithms in finance, medicine, and criminal justice can deeply impact human lives. As a consequence, research into interpretable machine learning has rapidly grown in an attempt to better control and fix possible sources of mistakes and biases. Tree ensembles offer a good prediction quality in various domains, but the concurrent use of multiple trees reduces the interpretability of the ensemble. Against this background, we study born-again tree ensembles, i.e., the process of constructing a single decision tree of minimum size that reproduces the exact same behavior as a given tree ensemble. To find such a tree, we develop a dynamic-programming based algorithm that exploits sophisticated pruning and bounding rules to reduce the number of recursive calls. This algorithm generates optimal born-again trees for many datasets of practical interest, leading to classifiers which are typically simpler and more interpretable without any other form of compromise.
Unsupervised Domain Adaptation Through Transferring both the Source-Knowledge and Target-Relatedness Simultaneously
Tian, Qing, Ma, Chuang, Cao, Meng, Chen, Songcan
Unsupervised domain adaptation (UDA) is an emerging research topic in the field of machine learning and pattern recognition, which aims to help the learning of unlabeled target domain by transferring knowledge from the source domain. To perform UDA, a variety of methods have been proposed, most of which concentrate on the scenario of single source and single target domain (1S1T). However, in real applications, usually single source domain with multiple target domains are involved (1SmT), which cannot be handled directly by those 1S1T models. Unfortunately, although a few related works on 1SmT UDA have been proposed, nearly none of them model the source domain knowledge and leverage the target-relatedness jointly. To overcome these shortcomings, we herein propose a more general 1SmT UDA model through transferring both the Source-Knowledge and Target-Relatedness, UDA-SKTR for short. In this way, not only the supervision knowledge from the source domain, but also the potential relatedness among the target domains are simultaneously modeled for exploitation in the process of 1SmT UDA. In addition, we construct an alternating optimization algorithm to solve the variables of the proposed model with convergence guarantee. Finally, through extensive experiments on both benchmark and real datasets, we validate the effectiveness and superiority of the proposed method.
Diagonal Preconditioning: Theory and Algorithms
Qu, Zhaonan, Ye, Yinyu, Zhou, Zhengyuan
Diagonal preconditioning has been a staple technique in optimization and machine learning. It often reduces the condition number of the design or Hessian matrix it is applied to, thereby speeding up convergence. However, rigorous analyses of how well various diagonal preconditioning procedures improve the condition number of the preconditioned matrix and how that translates into improvements in optimization are rare. In this paper, we first provide an analysis of a popular diagonal preconditioning technique based on column standard deviation and its effect on the condition number using random matrix theory. Then we identify a class of design matrices whose condition numbers can be reduced significantly by this procedure. We then study the problem of optimal diagonal preconditioning to improve the condition number of any full-rank matrix and provide a bisection algorithm and a potential reduction algorithm with $O(\log(\frac{1}{\epsilon}))$ iteration complexity, where each iteration consists of an SDP feasibility problem and a Newton update using the Nesterov-Todd direction, respectively. Finally, we extend the optimal diagonal preconditioning algorithm to an adaptive setting and compare its empirical performance at reducing the condition number and speeding up convergence for regression and classification problems with that of another adaptive preconditioning technique, namely batch normalization, that is essential in training machine learning models.
COEBA: A Coevolutionary Bat Algorithm for Discrete Evolutionary Multitasking
Osaba, Eneko, Del Ser, Javier, Yang, Xin-She, Iglesias, Andres, Galvez, Akemi
Multitasking optimization is an emerging research field which has attracted lot of attention in the scientific community. The main purpose of this paradigm is how to solve multiple optimization problems or tasks simultaneously by conducting a single search process. The main catalyst for reaching this objective is to exploit possible synergies and complementarities among the tasks to be optimized, helping each other by virtue of the transfer of knowledge among them (thereby being referred to as Transfer Optimization). In this context, Evolutionary Multitasking addresses Transfer Optimization problems by resorting to concepts from Evolutionary Computation for simultaneous solving the tasks at hand. This work contributes to this trend by proposing a novel algorithmic scheme for dealing with multitasking environments. The proposed approach, coined as Coevolutionary Bat Algorithm, finds its inspiration in concepts from both co-evolutionary strategies and the metaheuristic Bat Algorithm. We compare the performance of our proposed method with that of its Multifactorial Evolutionary Algorithm counterpart over 15 different multitasking setups, composed by eight reference instances of the discrete Traveling Salesman Problem. The experimentation and results stemming therefrom support the main hypothesis of this study: the proposed Coevolutionary Bat Algorithm is a promising meta-heuristic for solving Evolutionary Multitasking scenarios.
Real-Time Dispatching of Large-Scale Ride-Sharing Systems: Integrating Optimization, Machine Learning, and Model Predictive Control
Riley, Connor, Van Hentenryck, Pascal, Yuan, Enpeng
This paper considers the dispatching of large-scale real-time ride-sharing systems to address congestion issues faced by many cities. The goal is to serve all customers (service guarantees) with a small number of vehicles while minimizing waiting times under constraints on ride duration. This paper proposes an end-to-end approach that tightly integrates a state-of-the-art dispatching algorithm, a machine-learning model to predict zone-to-zone demand over time, and a model predictive control optimization to relocate idle vehicles. Experiments using historic taxi trips in New York City indicate that this integration decreases average waiting times by about 30% over all test cases and reaches close to 55% on the largest instances for high-demand zones.
Slow and Stale Gradients Can Win the Race
Dutta, Sanghamitra, Wang, Jianyu, Joshi, Gauri
Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in runtime as it waits for the slowest workers (stragglers). Asynchronous methods can alleviate stragglers, but cause gradient staleness that can adversely affect the convergence error. In this work, we present a novel theoretical characterization of the speedup offered by asynchronous methods by analyzing the trade-off between the error in the trained model and the actual training runtime(wallclock time). The main novelty in our work is that our runtime analysis considers random straggling delays, which helps us design and compare distributed SGD algorithms that strike a balance between straggling and staleness. We also provide a new error convergence analysis of asynchronous SGD variants without bounded or exponential delay assumptions. Finally, based on our theoretical characterization of the error-runtime trade-off, we propose a method of gradually varying synchronicity in distributed SGD and demonstrate its performance on CIFAR10 dataset.
A Unified Theory of Decentralized SGD with Changing Topology and Local Updates
Koloskova, Anastasia, Loizou, Nicolas, Boreiri, Sadra, Jaggi, Martin, Stich, Sebastian U.
Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency. In this paper we introduce a unified convergence analysis that covers a large variety of decentralized SGD methods which so far have required different intuitions, have different applications, and which have been developed separately in various communities. Our algorithmic framework covers local SGD updates and synchronous and pairwise gossip updates on adaptive network topology. We derive universal convergence rates for smooth (convex and non-convex) problems and the rates interpolate between the heterogeneous (non-identically distributed data) and iid-data settings, recovering linear convergence rates in many special cases, for instance for over-parametrized models. Our proofs rely on weak assumptions (typically improving over prior work in several aspects) and recover (and improve) the best known complexity results for a host of important scenarios, such as for instance coorperative SGD and federated averaging (local SGD).
Steepest Descent Neural Architecture Optimization: Escaping Local Optimum with Signed Neural Splitting
Wu, Lemeng, Ye, Mao, Lei, Qi, Lee, Jason D., Liu, Qiang
We propose signed splitting steepest descent (S3D), which progressively grows neural architectures by splitting critical neurons into multiple copies, following a theoretically-derived optimal scheme. Our algorithm is a generalization of the splitting steepest descent (S2D) of Liu et al. (2019b), but significantly improves over it by incorporating a rich set of new splitting schemes that allow negative output weights. By doing so, we can escape local optima that the original S2D can not escape. Theoretically, we show that our method provably learns neural networks with much smaller sizes than these needed for standard gradient descent in overparameterized regimes. Empirically, our method outperforms S2D and prior arts on various challenging benchmarks, including CIFAR-100, ImageNet and ModelNet40.
A Nonconvex Low-Rank Tensor Completion Model for Spatiotemporal Traffic Data Imputation
Chen, Xinyu, Yang, Jinming, Sun, Lijun
Sparsity and missing data problems are very common in spatiotemporal traffic data collected from various sensing systems. Making accurate imputation is critical to many applications in intelligent transportation systems. In this paper, we formulate the missing data imputation problem in spatiotemporal traffic data in a low-rank tensor completion (LRTC) framework and define a novel truncated nuclear norm (TNN) on traffic tensors of location$\times$day$\times$time of day. In particular, we introduce an universal rate parameter to control the degree of truncation on all tensor modes in the proposed LRTC-TNN model, and this allows us to better characterize the hidden patterns in spatiotemporal traffic data. Based on the framework of the Alternating Direction Method of Multipliers (ADMM), we present an efficient algorithm to obtain the optimal solution for each variable. We conduct numerical experiments on four spatiotemporal traffic data sets, and our results show that the proposed LRTC-TNN model outperforms many state-of-the-art imputation models with missing rates/patterns. Moreover, the proposed model also outperforms other baseline models in extreme missing scenarios.