Optimization
Efficiently solving the thief orienteering problem with a max-min ant colony optimization approach
Chagas, Jonatas B. C., Wagner, Markus
Multicomponent problems are hard to solve as each component has the potential to influence the feasibility as well as the quality of the other components [4]. Among the studied multi-component problems, vehicle routing problems with loading constraints [15] appear to be very frequently investigated. In these problems, tour are to be created for vehicles while constraints and aims of specific loading policies must be taken into account. One of these problems is the Traveling Thief Problem (TTP), which combines two classic well-known and well-studied problems: the Knapsack Problem (KP) and the Traveling Salesman Problem (TSP). The TTP was proposed in 2013 by Bonyadi et al. [3] in order to provide an academic abstraction of multi-component problems for the scientific community. In the TTP, a thief travels across all cities (TSP component) and steals items along the way (KP component).
Beyond Discriminant Patterns: On the Robustness of Decision Rule Ensembles
Du, Xin, Ramamoorthy, Subramanian, Duivesteijn, Wouter, Tian, Jin, Pechenizkiy, Mykola
Local decision rules are commonly understood to be more explainable, due to the local nature of the patterns involved. With numerical optimization methods such as gradient boosting, ensembles of local decision rules can gain good predictive performance on data involving global structure. Meanwhile, machine learning models are being increasingly used to solve problems in high-stake domains including healthcare and finance. Here, there is an emerging consensus regarding the need for practitioners to understand whether and how those models could perform robustly in the deployment environments, in the presence of distributional shifts. Past research on local decision rules has focused mainly on maximizing discriminant patterns, without due consideration of robustness against distributional shifts. In order to fill this gap, we propose a new method to learn and ensemble local decision rules, that are robust both in the training and deployment environments. Specifically, we propose to leverage causal knowledge by regarding the distributional shifts in subpopulations and deployment environments as the results of interventions on the underlying system. We propose two regularization terms based on causal knowledge to search for optimal and stable rules. Experiments on both synthetic and benchmark datasets show that our method is effective and robust against distributional shifts in multiple environments.
Off-line approximate dynamic programming for the vehicle routing problem with stochastic customers and demands via decentralized decision-making
Dastpak, Mohsen, Errico, Fausto
This paper studies a stochastic variant of the vehicle routing problem (VRP) where both customer locations and demands are uncertain. In particular, potential customers are not restricted to a predefined customer set but are continuously spatially distributed in a given service area. The objective is to maximize the served demands while fulfilling vehicle capacities and time restrictions. We call this problem the VRP with stochastic customers and demands (VRPSCD). For this problem, we first propose a Markov Decision Process (MDP) formulation representing the classical centralized decision-making perspective where one decision-maker establishes the routes of all vehicles. While the resulting formulation turns out to be intractable, it provides us with the ground to develop a new MDP formulation of the VRPSCD representing a decentralized decision-making framework, where vehicles autonomously establish their own routes. This new formulation allows us to develop several strategies to reduce the dimension of the state and action spaces, resulting in a considerably more tractable problem. We solve the decentralized problem via Reinforcement Learning, and in particular, we develop a Q-learning algorithm featuring state-of-the-art acceleration techniques such as Replay Memory and Double Q Network. Computational results show that our method considerably outperforms two commonly adopted benchmark policies (random and heuristic). Moreover, when comparing with existing literature, we show that our approach can compete with specialized methods developed for the particular case of the VRPSCD where customer locations and expected demands are known in advance. Finally, we show that the value functions and policies obtained by our algorithm can be easily embedded in Rollout algorithms, thus further improving their performances.
Distributed Mission Planning of Complex Tasks for Heterogeneous Multi-Robot Teams
Ferreira, Barbara Arbanas, Petrović, Tamara, Bogdan, Stjepan
In this paper, we propose a distributed multi-stage optimization method for planning complex missions for heterogeneous multi-robot teams. This class of problems involves tasks that can be executed in different ways and are associated with cross-schedule dependencies that constrain the schedules of the different robots in the system. The proposed approach involves a multi-objective heuristic search of the mission, represented as a hierarchical tree that defines the mission goal. This procedure outputs several favorable ways to fulfill the mission, which directly feed into the next stage of the method. We propose a distributed metaheuristic based on evolutionary computation to allocate tasks and generate schedules for the set of chosen decompositions. The method is evaluated in a simulation setup of an automated greenhouse use case, where we demonstrate the method's ability to adapt the planning strategy depending on the available robots and the given optimization criteria.
A Novel Structured Natural Gradient Descent for Deep Learning
In order to perform calculations faster, we need to find natural gradient algorithms with low computational complexity and low storage requirements. This paper proposes a structured natural gradient optimization method (SNGD) for learning deep neural networks. SNGD first reconfigures the parameter layer of the deep network by adding a new processing layer (named local Fisher layer); and then optimizes the reconstructed network model based on traditional GD, which is equivalent to the optimization of the original network using NGD, thus effectively reducing the computational complexity of NGD. With the introduction of the local Fisher layer, the curvature information of the loss function space can be captured, and an adjustment related to the spatial curvature is added to the original gradient direction, which ensures that there is a reasonable parameter change in each update during optimization, and improves the convergence speed of the parameters. We test the proposed approach on… The main contributions of this paper are as follows: 1) By adding a new local Fisher layer to reconstruct the network, the relevant calculation of the global Fisher matrix is decomposed and finally transformed into the use of traditional GD for optimization to achieve the effect of NGD. 2) A new layer - local Fisher layer and its efficient implementation scheme are proposed. Through the introduction of the second-order information, the local Fisher layer considers the different attributes of different positions of the parameters, and adds constraints to the transformation of the model parameters, so that the gradient update can be carried out stably and quickly.
Adaptive Reliability Analysis for Multi-fidelity Models using a Collective Learning Strategy
Zhang, Chi, Song, Chaolin, Shafieezadeh, Abdollah
In many fields of science and engineering, models with different fidelities are available. Physical experiments or detailed simulations that accurately capture the behavior of the system are regarded as high-fidelity models with low model uncertainty, however, they are expensive to run. On the other hand, simplified physical experiments or numerical models are seen as low-fidelity models that are cheaper to evaluate. Although low-fidelity models are often not suitable for direct use in reliability analysis due to their low accuracy, they can offer information about the trend of the high-fidelity model thus providing the opportunity to explore the design space at a low cost. This study presents a new approach called adaptive multi-fidelity Gaussian process for reliability analysis (AMGPRA). Contrary to selecting training points and information sources in two separate stages as done in state-of-the-art mfEGRA method, the proposed approach finds the optimal training point and information source simultaneously using the novel collective learning function (CLF). CLF is able to assess the global impact of a candidate training point from an information source and it accommodates any learning function that satisfies a certain profile. In this context, CLF provides a new direction for quantifying the impact of new training points and can be easily extended with new learning functions to adapt to different reliability problems. The performance of the proposed method is demonstrated by three mathematical examples and one engineering problem concerning the wind reliability of transmission towers. It is shown that the proposed method achieves similar or higher accuracy with reduced computational costs compared to state-of-the-art single and multi-fidelity methods. A key application of AMGPRA is high-fidelity fragility modeling using complex and costly physics-based computational models.
Asymptotic Causal Inference
We investigate causal inference in the asymptotic regime as the number of variables approaches infinity using an information-theoretic framework. We define structural entropy of a causal model in terms of its description complexity measured by the logarithmic growth rate, measured in bits, of all directed acyclic graphs (DAGs), parameterized by the edge density d. Structural entropy yields non-intuitive predictions. If we randomly sample a DAG from the space of all models, in the range d = (0, 1/8), almost surely the model is a two-layer DAG! Semantic entropy quantifies the reduction in entropy where edges are removed by causal intervention. Semantic causal entropy is defined as the f-divergence between the observational distribution and the interventional distribution P', where a subset S of edges are intervened on to determine their causal influence. We compare the decomposability properties of semantic entropy for different choices of f-divergences, including KL-divergence, squared Hellinger distance, and total variation distance. We apply our framework to generalize a recently popular bipartite experimental design for studying causal inference on large datasets, where interventions are carried out on one set of variables (e.g., power plants, items in an online store), but outcomes are measured on a disjoint set of variables (residents near power plants, or shoppers). We generalize bipartite designs to k-partite designs, and describe an optimization framework for finding the optimal k-level DAG architecture for any value of d \in (0, 1/2). As edge density increases, a sequence of phase transitions occur over disjoint intervals of d, with deeper DAG architectures emerging for larger values of d. We also give a quantitative bound on the number of samples needed to reliably test for average causal influence for a k-partite design.
Sharp global convergence guarantees for iterative nonconvex optimization: A Gaussian process perspective
Chandrasekher, Kabir Aladin, Pananjady, Ashwin, Thrampoulidis, Christos
We consider a general class of regression models with normally distributed covariates, and the associated nonconvex problem of fitting these models from data. We develop a general recipe for analyzing the convergence of iterative algorithms for this task from a random initialization. In particular, provided each iteration can be written as the solution to a convex optimization problem satisfying some natural conditions, we leverage Gaussian comparison theorems to derive a deterministic sequence that provides sharp upper and lower bounds on the error of the algorithm with sample-splitting. Crucially, this deterministic sequence accurately captures both the convergence rate of the algorithm and the eventual error floor in the finite-sample regime, and is distinct from the commonly used "population" sequence that results from taking the infinite-sample limit. We apply our general framework to derive several concrete consequences for parameter estimation in popular statistical models including phase retrieval and mixtures of regressions. Provided the sample size scales near-linearly in the dimension, we show sharp global convergence rates for both higher-order algorithms based on alternating updates and first-order algorithms based on subgradient descent. These corollaries, in turn, yield multiple consequences, including: (a) Proof that higher-order algorithms can converge significantly faster than their first-order counterparts (and sometimes super-linearly), even if the two share the same population update and (b) Intricacies in super-linear convergence behavior for higher-order algorithms, which can be nonstandard (e.g., with exponent 3/2) and sensitive to the noise level in the problem. We complement these results with extensive numerical experiments, which show excellent agreement with our theoretical predictions.
SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization
Lindauer, Marius, Eggensperger, Katharina, Feurer, Matthias, Biedenkapp, André, Deng, Difan, Benjamins, Carolin, Sass, René, Hutter, Frank
Algorithm parameters, in particular hyperparameters of machine learning algorithms, can substantially impact their performance. To support users in determining well-performing hyperparameter configurations for their algorithms, datasets and applications at hand, SMAC3 offers a robust and flexible framework for Bayesian Optimization, which can improve performance within a few evaluations. It offers several facades and pre-sets for typical use cases, such as optimizing hyperparameters, solving low dimensional continuous (artificial) global optimization problems and configuring algorithms to perform well across multiple problem instances. The SMAC3 package is available under a permissive BSD-license at https://github.com/automl/SMAC3.
Local versions of sum-of-norms clustering
Dunlap, Alexander, Mourrat, Jean-Christophe
Sum-of-norms clustering is a convex optimization problem whose solution can be used for the clustering of multivariate data. We propose and study a localized version of this method, and show in particular that it can separate arbitrarily close balls in the stochastic ball model. More precisely, we prove a quantitative bound on the error incurred in the clustering of disjoint connected sets. Our bound is expressed in terms of the number of datapoints and the localization length of the functional.