Optimization
A Meta-Learning Approach for Training Explainable Graph Neural Networks
Spinelli, Indro, Scardapane, Simone, Uncini, Aurelio
In this paper, we investigate the degree of explainability of graph neural networks (GNNs). Existing explainers work by finding global/local subgraphs to explain a prediction, but they are applied after a GNN has already been trained. Here, we propose a meta-learning framework for improving the level of explainability of a GNN directly at training time, by steering the optimization procedure towards what we call `interpretable minima'. Our framework (called MATE, MetA-Train to Explain) jointly trains a model to solve the original task, e.g., node classification, and to provide easily processable outputs for downstream algorithms that explain the model's decisions in a human-friendly way. In particular, we meta-train the model's parameters to quickly minimize the error of an instance-level GNNExplainer trained on-the-fly on randomly sampled nodes. The final internal representation relies upon a set of features that can be `better' understood by an explanation algorithm, e.g., another instance of GNNExplainer. Our model-agnostic approach can improve the explanations produced for different GNN architectures and use any instance-based explainer to drive this process. Experiments on synthetic and real-world datasets for node and graph classification show that we can produce models that are consistently easier to explain by different algorithms. Furthermore, this increase in explainability comes at no cost for the accuracy of the model.
Computationally Efficient High-Dimensional Bayesian Optimization via Variable Selection
Bayesian Optimization (BO) is a method for globally optimizing black-box functions. While BO has been successfully applied to many scenarios, developing effective BO algorithms that scale to functions with high-dimensional domains is still a challenge. Optimizing such functions by vanilla BO is extremely time-consuming. Alternative strategies for high-dimensional BO that are based on the idea of embedding the high-dimensional space to the one with low dimension are sensitive to the choice of the embedding dimension, which needs to be pre-specified. We develop a new computationally efficient high-dimensional BO method that exploits variable selection. Our method is able to automatically learn axis-aligned sub-spaces, i.e. spaces containing selected variables, without the demand of any pre-specified hyperparameters. We theoretically analyze the computational complexity of our algorithm and derive the regret bound. We empirically show the efficacy of our method on several synthetic and real problems.
Optimal Ensemble Construction for Multi-Study Prediction with Applications to COVID-19 Excess Mortality Estimation
Loewinger, Gabriel, Nunez, Rolando Acosta, Mazumder, Rahul, Parmigiani, Giovanni
It is increasingly common to encounter prediction tasks in the biomedical sciences for which multiple datasets are available for model training. Common approaches such as pooling datasets and applying standard statistical learning methods can result in poor out-of-study prediction performance when datasets are heterogeneous. Theoretical and applied work has shown $\textit{multi-study ensembling}$ to be a viable alternative that leverages the variability across datasets in a manner that promotes model generalizability. Multi-study ensembling uses a two-stage $\textit{stacking}$ strategy which fits study-specific models and estimates ensemble weights separately. This approach ignores, however, the ensemble properties at the model-fitting stage, potentially resulting in a loss of efficiency. We therefore propose $\textit{optimal ensemble construction}$, an $\textit{all-in-one}$ approach to multi-study stacking whereby we jointly estimate ensemble weights as well as parameters associated with each study-specific model. We prove that limiting cases of our approach yield existing methods such as multi-study stacking and pooling datasets before model fitting. We propose an efficient block coordinate descent algorithm to optimize the proposed loss function. We compare our approach to standard methods by applying it to a multi-country COVID-19 dataset for baseline mortality prediction. We show that when little data is available for a country before the onset of the pandemic, leveraging data from other countries can substantially improve prediction accuracy. Importantly, our approach outperforms multi-study stacking and other standard methods in this application. We further characterize the method's performance in data-driven and other simulations. Our method remains competitive with or outperforms multi-study stacking and other earlier methods across a range of between-study heterogeneity levels.
Policy Optimizations: TRPO/PPO
In this post, I will be talking about policy optimization methods from the papers Trust Region Policy Optimization (Schulman et al. 2015) and Proximal Policy Optimization Algorithms (Schulman et al. 2017). I will then briefly go over the Trust Region Policy Optimization method and two types of Proximal Policy Optimization methods: adaptive KL (Kullback-Leibler) penalties to the surrogate objective and clipped surrogate objective. In a traditional policy gradient method, we sample a trajectory of states, actions, and rewards, then update the policy using the sampled trajectories. While this method is great and solves basic control problems, the algorithm tends to be unstable and is inconsistent in solving an environment. A problem is that as we are updating the policy, the distribution of the inputs and outputs of the approximated policy distribution will change, resulting in instability.
Complementing the Linear-Programming Learning Experience with the Design and Use of Computerized Games: The Formula 1 Championship Game
This document focuses on modeling a complex situations to achieve an advantage within a competitive context. Our goal is to devise the characteristics of games to teach and exercise non-easily quantifiable tasks crucial to the math-modeling process. A computerized game to exercise the math-modeling process and optimization problem formulation is introduced. The game is named The Formula 1 Championship, and models of the game were developed in the computerized simulation platform MoNet. It resembles some situations in which team managers must make crucial decisions to enhance their racing cars up to the feasible, most advantageous conditions. This paper describes the game's rules, limitations, and five Formula 1 circuit simulators used for the championship development. We present several formulations of this situation in the form of optimization problems. Administering the budget to reach the best car adjustment to a set of circuits to win the respective races can be an approach. Focusing on the best distribution of each Grand Prix's budget and then deciding how to use the assigned money to improve the car is also the right approach. In general, there may be a degree of conflict among these approaches because they are different aspects of the same multi-scale optimization problem. Therefore, we evaluate the impact of assigning the highest priority to an element, or another, when formulating the optimization problem. Studying the effectiveness of solving such optimization problems turns out to be an exciting way of evaluating the advantages of focusing on one scale or another. Another thread of this research directs to the meaning of the game in the teaching-learning process. We believe applying the Formula 1 Game is an effective way to discover opportunities in a complex-system situation and formulate them to finally extract and concrete the related benefit to the context described.
Policy Optimizations: TRPO/PPO
In this post, I will be talking about policy optimization methods from the papers Trust Region Policy Optimization (Schulman et al. 2015) and Proximal Policy Optimization Algorithms (Schulman et al. 2017). I will then briefly go over the Trust Region Policy Optimization method and two types of Proximal Policy Optimization methods: adaptive KL (Kullback-Leibler) penalties to the surrogate objective and clipped surrogate objective. In a traditional policy gradient method, we sample a trajectory of states, actions, and rewards, then update the policy using the sampled trajectories. While this method is great and solves basic control problems, the algorithm tends to be unstable and is inconsistent in solving an environment. A problem is that as we are updating the policy, the distribution of the inputs and outputs of the approximated policy distribution will change, resulting in instability.
Segmentation of Brain MRI using an Altruistic Harris Hawks' Optimization algorithm
Bandyopadhyay, Rajarshi, Kundu, Rohit, Oliva, Diego, Sarkar, Ram
Segmentation is an essential requirement in medicine when digital images are used in illness diagnosis, especially, in posterior tasks as analysis and disease identification. An efficient segmentation of brain Magnetic Resonance Images (MRIs) is of prime concern to radiologists due to their poor illumination and other conditions related to de acquisition of the images. Thresholding is a popular method for segmentation that uses the histogram of an image to label different homogeneous groups of pixels into different classes. However, the computational cost increases exponentially according to the number of thresholds. In this paper, we perform the multi-level thresholding using an evolutionary metaheuristic. It is an improved version of the Harris Hawks Optimization (HHO) algorithm that combines the chaotic initialization and the concept of altruism. Further, for fitness assignment, we use a hybrid objective function where along with the cross-entropy minimization, we apply a new entropy function, and leverage weights to the two objective functions to form a new hybrid approach. The HHO was originally designed to solve numerical optimization problems. Earlier, the statistical results and comparisons have demonstrated that the HHO provides very promising results compared with well-established metaheuristic techniques. In this article, the altruism has been incorporated into the HHO algorithm to enhance its exploitation capabilities. We evaluate the proposed method over 10 benchmark images from the WBA database of the Harvard Medical School and 8 benchmark images from the Brainweb dataset using some standard evaluation metrics.
La veille de la cybersécurité
Getting the software right is important when developing machine learning models, such as recommendation or classification systems. But at eBay, optimizing the software to run on a particular piece of hardware using distillation and quantization techniques was absolutely essential to ensure scalability. "[I]n order to build a truly global marketplace that is driven by state of the art and powerful and scalable AI services," Kopru said, "you have to do a lot of optimizations after model training, and specifically for the target hardware." With 1.5 billion active listings from more than 19 million active sellers trying to reach 159 million active buyers, the ecommerce giant has a global reach that is matched by only a handful of firms. Machine learning and other AI techniques, such as natural language processing (NLP), play big roles in scaling eBay's operations to reach its massive audience. For instance, automatically generated descriptions of product listings is crucial for displaying information on the small screens of smart phones, Kopru said.
CompilerGym: Robust, Performant Compiler Optimization Environments for AI Research
Cummins, Chris, Wasti, Bram, Guo, Jiadong, Cui, Brandon, Ansel, Jason, Gomez, Sahir, Jain, Somya, Liu, Jia, Teytaud, Olivier, Steiner, Benoit, Tian, Yuandong, Leather, Hugh
Interest in applying Artificial Intelligence (AI) techniques to compiler optimizations is increasing rapidly, but compiler research has a high entry barrier. Unlike in other domains, compiler and AI researchers do not have access to the datasets and frameworks that enable fast iteration and development of ideas, and getting started requires a significant engineering investment. What is needed is an easy, reusable experimental infrastructure for real world compiler optimization tasks that can serve as a common benchmark for comparing techniques, and as a platform to accelerate progress in the field. We introduce CompilerGym, a set of environments for real world compiler optimization tasks, and a toolkit for exposing new optimization tasks to compiler researchers. CompilerGym enables anyone to experiment on production compiler optimization problems through an easy-to-use package, regardless of their experience with compilers. We build upon the popular OpenAI Gym interface enabling researchers to interact with compilers using Python and a familiar API. We describe the CompilerGym architecture and implementation, characterize the optimization spaces and computational efficiencies of three included compiler environments, and provide extensive empirical evaluations. Compared to prior works, CompilerGym offers larger datasets and optimization spaces, is 27x more computationally efficient, is fault-tolerant, and capable of detecting reproducibility bugs in the underlying compilers. In making it easy for anyone to experiment with compilers - irrespective of their background - we aim to accelerate progress in the AI and compiler research domains.
Distributionally Robust Optimal Power Flow with Contextual Information
Esteban-Pérez, Adrián, Morales, Juan M.
In this paper, we develop a distributionally robust chance-constrained formulation of the Optimal Power Flow problem (OPF) whereby the system operator can leverage contextual information. For this purpose, we exploit an ambiguity set based on probability trimmings and optimal transport through which the dispatch solution is protected against the incomplete knowledge of the relationship between the OPF uncertainties and the context that is conveyed by a sample of their joint probability distribution. We provide an exact reformulation of the proposed distributionally robust chance-constrained OPF problem under the popular conditional-value-at-risk approximation. By way of numerical experiments run on a modified IEEE-118 bus network with wind uncertainty, we show how the power system can substantially benefit from taking into account the well-known statistical dependence between the point forecast of wind power outputs and its associated prediction error. Furthermore, the experiments conducted also reveal that the distributional robustness conferred on the OPF solution by our probability-trimmings-based approach is superior to that bestowed by alternative approaches in terms of expected cost and system reliability.