Optimization
D'ya like DAGs? A Survey on Structure Learning and Causal Discovery
Vowels, Matthew J., Camgoz, Necati Cihan, Bowden, Richard
It is important for a broad range of applications, including policy making [136], medical imaging [30], advertisement [22], the development of medical treatments [189], the evaluation of evidence within legal frameworks [183, 218], social science [82, 96, 246], biology [235], and many others. It is also a burgeoning topic in machine learning and artificial intelligence [17, 66, 76, 144, 210, 247, 255], where it has been argued that a consideration for causality is crucial for reasoning about the world. In order to discover causal relations, and thereby gain causal understanding, one may perform interventions and manipulations as part of a randomized experiment. These experiments may not only allow researchers or agents to identify causal relationships, but also to estimate the magnitude of these relationships. Unfortunately, in many cases, it may not be possible to undertake such experiments due to prohibitive cost, ethical concerns, or impracticality.
In a Nutshell -- The Sequential Parameter Optimization Toolbox
Bartz-Beielstein, Thomas, Zaefferer, Martin, Rehbach, Frederik
The performance of optimization algorithms relies crucially on their parameterizations. Finding good parameter settings is called algorithm tuning. The sequential parameter optimization (SPOT) package for R is a toolbox for tuning and understanding simulation and optimization algorithms. Model-based investigations are common approaches in simulation and optimization. Sequential parameter optimization has been developed, because there is a strong need for sound statistical analysis of simulation and optimization algorithms. SPOT includes methods for tuning based on classical regression and analysis of variance techniques; tree-based models such as CART and random forest; Gaussian process models (Kriging), and combinations of different meta-modeling approaches. Using a simple simulated annealing algorithm, we will demonstrate how optimization algorithms can be tuned using SPOT. The underling concepts of the SPOT approach are explained. This includes key techniques such as exploratory fitness landscape analysis and sensititvity analysis. Many examples illustrate how SPOT can be used for understanding the performance of algorithms and gaining insight into algorithm's behavior. Furthermore, we demonstrate how SPOT can be used as an optimizer and how a sophisticated ensemble approach is able to combine several meta models via stacking. This article exemplifies how SPOT can be used for automatic and interactive tuning.
We don't need to worry about Overfitting anymore
Motivated by prior work connecting the geometry of the loss landscape and generalization, we introduce a novel, effective procedure for instead simulta- neously minimizing loss value and loss sharpness. In particular, our procedure, Sharpness-Aware Minimization (SAM), seeks parameters that lie in neighbor- hoods having uniformly low loss; this formulation results in a min-max optimiza- tion problem on which gradient descent can be performed efficiently. We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets[1] In Deep Learning we use optimization algorithms such as SGD/Adam to achieve convergence in our model, which leads to finding the global minima, i.e a point where the loss of the training dataset is low. But several kinds of research such as Zhang et al have shown, many networks can easily memorize the training data and have the capacity to readily overfit, To prevent this problem and add more generalization, Researchers at Google have published a new paper called Sharpness Awareness Minimization which provides State of the Art results on CIFAR10 and other datasets. In this article, we will look at why SAM can achieve better generalization and how we can implement SAM in Pytorch.
STEP: Stochastic Traversability Evaluation and Planning for Safe Off-road Navigation
Fan, David D., Otsu, Kyohei, Kubo, Yuki, Dixit, Anushri, Burdick, Joel, Agha-Mohammadi, Ali-Akbar
Although ground robotic autonomy has gained widespread usage in structured and controlled environments, autonomy in unknown and off-road terrain remains a difficult problem. Extreme, off-road, and unstructured environments such as undeveloped wilderness, caves, and rubble pose unique and challenging problems for autonomous navigation. To tackle these problems we propose an approach for assessing traversability and planning a safe, feasible, and fast trajectory in real-time. Our approach, which we name STEP (Stochastic Traversability Evaluation and Planning), relies on: 1) rapid uncertainty-aware mapping and traversability evaluation, 2) tail risk assessment using the Conditional Value-at-Risk (CVaR), and 3) efficient risk and constraint-aware kinodynamic motion planning using sequential quadratic programming-based (SQP) model predictive control (MPC). We analyze our method in simulation and validate its efficacy on wheeled and legged robotic platforms exploring extreme terrains including an underground lava tube.
Stochastic Cutting Planes for Data-Driven Optimization
Bertsimas, Dimitris, Li, Michael Lingzhi
We introduce a stochastic version of the cutting-plane method for a large class of data-driven Mixed-Integer Nonlinear Optimization (MINLO) problems. We show that under very weak assumptions the stochastic algorithm is able to converge to an $\epsilon$-optimal solution with high probability. Numerical experiments on several problems show that stochastic cutting planes is able to deliver a multiple order-of-magnitude speedup compared to the standard cutting-plane method. We further experimentally explore the lower limits of sampling for stochastic cutting planes and show that for many problems, a sampling size of $O(\sqrt[3]{n})$ appears to be sufficient for high quality solutions.
On the Importance of Sampling in Learning Graph Convolutional Networks
Cong, Weilin, Ramezani, Morteza, Mahdavi, Mehrdad
Graph Convolutional Networks (GCNs) have achieved impressive empirical advancement across a wide variety of graph-related applications. Despite their great success, training GCNs on large graphs suffers from computational and memory issues. A potential path to circumvent these obstacles is sampling-based methods, where at each layer a subset of nodes is sampled. Although recent studies have empirically demonstrated the effectiveness of sampling-based methods, these works lack theoretical convergence guarantees under realistic settings and cannot fully leverage the information of evolving parameters during optimization. In this paper, we describe and analyze a general \textbf{\textit{doubly variance reduction}} schema that can accelerate any sampling method under the memory budget. The motivating impetus for the proposed schema is a careful analysis for the variance of sampling methods where it is shown that the induced variance can be decomposed into node embedding approximation variance (\emph{zeroth-order variance}) during forward propagation and layerwise-gradient variance (\emph{first-order variance}) during backward propagation. We theoretically analyze the convergence of the proposed schema and show that it enjoys an $\mathcal{O}(1/T)$ convergence rate. We complement our theoretical results by integrating the proposed schema in different sampling methods and applying them to different large real-world graphs. Code is public available at~\url{https://github.com/CongWeilin/SGCN.git}.
Self-play Learning Strategies for Resource Assignment in Open-RAN Networks
Wang, Xiaoyang, Thomas, Jonathan D, Piechocki, Robert J, Kapoor, Shipra, Santos-Rodriguez, Raul, Parekh, Arjun
Open Radio Access Network (ORAN) is being developed with an aim to democratise access and lower the cost of future mobile data networks, supporting network services with various QoS requirements, such as massive IoT and URLLC. In ORAN, network functionality is dis-aggregated into remote units (RUs), distributed units (DUs) and central units (CUs), which allows flexible software on Commercial-Off-The-Shelf (COTS) deployments. Furthermore, the mapping of variable RU requirements to local mobile edge computing centres for future centralized processing would significantly reduce the power consumption in cellular networks. In this paper, we study the RU-DU resource assignment problem in an ORAN system, modelled as a 2D bin packing problem. A deep reinforcement learning-based self-play approach is proposed to achieve efficient RU-DU resource management, with AlphaGo Zero inspired neural Monte-Carlo Tree Search (MCTS). Experiments on representative 2D bin packing environment and real sites data show that the self-play learning strategy achieves intelligent RU-DU resource assignment for different network conditions.
Fairness and Robustness of Contrasting Explanations
Artelt, André, Hammer, Barbara
Fairness and explainability are two important and closely related requirements of decision making systems. While fairness and explainability of decision making systems have been extensively studied independently, only little effort has been put into studying fairness of explanations on their own. Current explanations can be unfair to an individual: an example is given by counterfactual explanations which propose different actions to change the output class to two similar individuals. In this work we formally and empirically study individual fairness and its mathematical formalization as robustness for counterfactual explanations as a prominent instance of contrasting explanations. In addition, we propose to use plausible counterfactuals instead of closest counterfactuals for improving the individual fairness of counterfactual explanations.
Letter to a CIO – Understanding your dilemma and how to move forward. Part 2
This article represents the second part of a series called "Letter to a CIO", which reports the discussions between the author of the letter, dr. Domenico Lepore Founder Intelligent Managemnt Inc. and several Chief Information Officers, with the aim of providing them with an effective methodology to address and successfully solve common problems that CIOs face in the Digital Age. The result of this series of interviews helped dr. A CIO MUST have the abilities necessary to accomplish the transformation from a silo-based Hierarchy to whole system optimization. Without this ability, CIOs will very soon become a relic, something that can be easily disposed of.
Demystifying Batch Normalization in ReLU Networks: Equivalent Convex Optimization Models and Implicit Regularization
Ergen, Tolga, Sahiner, Arda, Ozturkler, Batu, Pauly, John, Mardani, Morteza, Pilanci, Mert
Batch Normalization (BN) is a commonly used technique to accelerate and stabilize training of deep neural networks. Despite its empirical success, a full theoretical understanding of BN is yet to be developed. In this work, we analyze BN through the lens of convex optimization. We introduce an analytic framework based on convex duality to obtain exact convex representations of weight-decay regularized ReLU networks with BN, which can be trained in polynomial-time. Our analyses also show that optimal layer weights can be obtained as simple closed-form formulas in the high-dimensional and/or overparameterized regimes. Furthermore, we find that Gradient Descent provides an algorithmic bias effect on the standard non-convex BN network, and we design an approach to explicitly encode this implicit regularization into the convex objective. Experiments with CIFAR image classification highlight the effectiveness of this explicit regularization for mimicking and substantially improving the performance of standard BN networks.