Optimization
A Lagrangian Dual Framework for Deep Neural Networks with Constraints
Fioretto, Ferdinando, Mak, Terrence WK, Baldo, Federico, Lombardi, Michele, Van Hentenryck, Pascal
A variety of computationally challenging constrained optimization problems in several engineering disciplines are solved repeatedly under different scenarios. In many cases, they would benefit from fast and accurate approximations, either to support real-time operations or large-scale simulation studies. This paper aims at exploring how to leverage the substantial data being accumulated by repeatedly solving instances of these applications over time. It introduces a deep learning model that exploits Lagrangian duality to encourage the satisfaction of hard constraints. The proposed method is evaluated on a collection of realistic energy networks, by enforcing non-discriminatory decisions on a variety of datasets, and on a transprecision computing application. The results illustrate the effectiveness of the proposed method that dramatically decreases constraint violations by the predictors and, in some applications, increases the prediction accuracy.
Tight Regret Bounds for Noisy Optimization of a Brownian Motion
Wang, Zexin, Tan, Vincent Y. F., Scarlett, Jonathan
We consider the problem of Bayesian optimization of a one-dimensional Brownian motion in which the $T$ adaptively chosen observations are corrupted by Gaussian noise. We show that as the smallest possible expected simple regret and the smallest possible expected cumulative regret scale as $\Omega(1 / \sqrt{T \log (T)}) \cap \mathcal{O}(\log T / \sqrt{T})$ and $\Omega(\sqrt{T / \log (T)}) \cap \mathcal{O}(\sqrt{T} \cdot \log T)$ respectively. Thus, our upper and lower bounds are tight up to a factor of $\mathcal{O}( (\log T)^{1.5} )$. The upper bound uses an algorithm based on confidence bounds and the Markov property of Brownian motion, and the lower bound is based on a reduction to binary hypothesis testing.
Robust Submodular Minimization with Applications to Cooperative Modeling
Robust Optimization is becoming increasingly important in machine learning applications. This paper studies the problem of robust submodular minimization subject to combinatorial constraints. Constrained Submodular Minimization arises in several applications such as co-operative cuts in image segmentation, co-operative matchings in image correspondence, etc. Many of these models are defined over clusterings of data points (for example pixels in images), and it is important for these models to be robust to perturbations and uncertainty in the data. While several existing papers have studied robust submodular maximization, ours is the first work to study the minimization version under a broad range of combinatorial constraints including cardinality, knapsack, matroid as well as graph-based constraints such as cuts, paths, matchings, and trees. In each case, we provide scalable approximation algorithms and also study hardness bounds. Finally, we empirically demonstrate the utility of our algorithms on synthetic and real-world datasets.
On the Performance of Metaheuristics: A Different Perspective
Boveiri, Hamid Reza, Khayami, Raouf
Nowadays, we are immersed in tens of newly-proposed evolutionary and swam-intelligence metaheuristics, which makes it very difficult to choose a proper one to be applied on a specific optimization problem at hand. On the other hand, most of these metaheuristics are nothing but slightly modified variants of the basic metaheuristics. For example, Differential Evolution (DE) or Shuffled Frog Leaping (SFL) are just Genetic Algorithms (GA) with a specialized operator or an extra local search, respectively. Therefore, what comes to the mind is whether the behavior of such newly-proposed metaheuristics can be investigated on the basis of studying the specifications and characteristics of their ancestors. In this paper, a comprehensive evaluation study on some basic metaheuristics i.e. Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Artificial Bee Colony (ABC), Teaching-Learning-Based Optimization (TLBO), and Cuckoo Optimization algorithm (COA) is conducted, which give us a deeper insight into the performance of them so that we will be able to better estimate the performance and applicability of all other variations originated from them. A large number of experiments have been conducted on 20 different combinatorial optimization benchmark functions with different characteristics, and the results reveal to us some fundamental conclusions besides the following ranking order among these metaheuristics, {ABC, PSO, TLBO, GA, COA} i.e. ABC and COA are the best and the worst methods from the performance point of view, respectively. In addition, from the convergence perspective, PSO and ABC have significant better convergence for unimodal and multimodal functions, respectively, while GA and COA have premature convergence to local optima in many cases needing alternative mutation mechanisms to enhance diversification and global search.
Stacked Auto Encoder Based Deep Reinforcement Learning for Online Resource Scheduling in Large-Scale MEC Networks
Jiang, Feibo, Wang, Kezhi, Dong, Li, Pan, Cunhua, Yang, Kun
An online resource scheduling framework is proposed for minimizing the sum of weighted task latency for all the mobile users, by optimizing offloading decision, transmission power, and resource allocation in the mobile edge computing (MEC) system. Towards this end, a deep reinforcement learning (DRL) method is proposed to obtain an online resource scheduling policy. Firstly, a related and regularized stacked auto encoder (2r-SAE) with unsupervised learning is proposed to perform data compression and representation for high dimensional channel quality information (CQI) data, which can reduce the state space for DRL. Secondly, we present an adaptive simulated annealing based approach (ASA) as the action search method of DRL, in which an adaptive h-mutation is used to guide the search direction and an adaptive iteration is proposed to enhance the search efficiency during the DRL process. Thirdly, a preserved and prioritized experience replay (2p-ER) is introduced to assist the DRL to train the policy network and find the optimal offloading policy. Numerical results are provided to demonstrate that the proposed algorithm can achieve near-optimal performance while significantly decreasing the computational time compared with existing benchmarks. It also shows that the proposed framework is suitable for resource scheduling problem in large-scale MEC networks, especially in the dynamic environment.
From Nesterov's Estimate Sequence to Riemannian Acceleration
We propose the first global accelerated gradient method for Riemannian manifolds. Toward establishing our result we revisit Nesterov's estimate sequence technique and develop an alternative analysis for it that may also be of independent interest. Then, we extend this analysis to the Riemannian setting, localizing the key difficulty due to non-Euclidean structure into a certain ``metric distortion.'' We control this distortion by developing a novel geometric inequality, which permits us to propose and analyze a Riemannian counterpart to Nesterov's accelerated gradient method.
Expected Information Maximization: Using the I-Projection for Mixture Density Estimation
Becker, Philipp, Arenz, Oleg, Neumann, Gerhard
Modelling highly multi-modal data is a challenging problem in machine learning. Most algorithms are based on maximizing the likelihood, which corresponds to the M(oment)-projection of the data distribution to the model distribution. The M-projection forces the model to average over modes it cannot represent. In contrast, the I(information)-projection ignores such modes in the data and concentrates on the modes the model can represent. Such behavior is appealing whenever we deal with highly multi-modal data where modelling single modes correctly is more important than covering all the modes. Despite this advantage, the I-projection is rarely used in practice due to the lack of algorithms that can efficiently optimize it based on data. In this work, we present a new algorithm called Expected Information Maximization (EIM) for computing the I-projection solely based on samples for general latent variable models, where we focus on Gaussian mixtures models and Gaussian mixtures of experts. Our approach applies a variational upper bound to the I-projection objective which decomposes the original objective into single objectives for each mixture component as well as for the coefficients, allowing an efficient optimization. Similar to GANs, our approach employs discriminators but uses a more stable optimization procedure, using a tight upper bound. We show that our algorithm is much more effective in computing the I-projection than recent GAN approaches and we illustrate the effectiveness of our approach for modelling multi-modal behavior on two pedestrian and traffic prediction datasets.
Best Principal Submatrix Selection for the Maximum Entropy Sampling Problem: Scalable Algorithms and Performance Guarantees
This paper studies a classic maximum entropy sampling problem (MESP), which aims to select the most informative principal submatrix of a prespecified size from a covariance matrix. MESP has been widely applied to many areas, including healthcare, power system, manufacturing and data science. By investigating its Lagrangian dual and primal characterization, we derive a novel convex integer program for MESP and show that its continuous relaxation yields a near-optimal solution. The results motivate us to study an efficient sampling algorithm and develop its approximation bound for MESP, which improves the best-known bound in literature. We then provide an efficient deterministic implementation of the sampling algorithm with the same approximation bound. By developing new mathematical tools for the singular matrices and analyzing the Lagrangian dual of the proposed convex integer program, we investigate the widely-used local search algorithm and prove its first-known approximation bound for MESP. The proof techniques further inspire us with an efficient implementation of the local search algorithm. Our numerical experiments demonstrate that these approximation algorithms can efficiently solve medium-sized and large-scale instances to near-optimality. Our proposed algorithms are coded and released as open-source software. Finally, we extend the analyses to the A-Optimal MESP (A-MESP), where the objective is to minimize the trace of the inverse of the selected principal submatrix.
A New Arc-Routing Algorithm Applied to Winter Road Maintenance
Fink, Jiří, Loebl, Martin, Pelikánová, Petra
This problem is well-known to be NPhard [3]. For our purposes, an edge version is important: Given a subset of edges (instead of vertices), find a connected subgraph of G containing all prescribed edges with the minimal weight. This edge version of Steiner tree problem makes our problem hard even if the limit L is sufficiently large to ensure that only one inert (or chemical) car can maintain all inert roads. In this case, the set of all inert roads may be disconnected so a set of chemical roads has to be added to the inert car's plan to ensure connectivity. Finding such a set of chemical roads with the minimal weight is exactly the edge version of the Steiner tree problem. This kind of reasoning is used in Section 5 to obtain lower bounds on a minimal deadhead.
Explainable Machine Learning Control -- robust control and stability analysis
Quade, Markus, Isele, Thomas, Abel, Markus
Recently, the term explainable AI became known as an approach to produce models from artificial intelligence which allow interpretation. Since a long time, there are models of symbolic regression in use that are perfectly explainable and mathematically tractable: in this contribution we demonstrate how to use symbolic regression methods to infer the optimal control of a dynamical system given one or several optimization criteria, or cost functions. In previous publications, network control was achieved by automatized machine learning control using genetic programming. Here, we focus on the subsequent analysis of the analytical expressions which result from the machine learning. In particular, we use AUTO to analyze the stability properties of the controlled oscillator system which served as our model. As a result, we show that there is a considerable advantage of explainable models over less accessible neural networks.