Guyon, Isabelle
Asynchronous Decentralized Bayesian Optimization for Large Scale Hyperparameter Optimization
Egele, Romain, Guyon, Isabelle, Vishwanath, Venkatram, Balaprakash, Prasanna
Bayesian optimization (BO) is a promising approach for hyperparameter optimization of deep neural networks (DNNs), where each model training can take minutes to hours. In BO, a computationally cheap surrogate model is employed to learn the relationship between parameter configurations and their performance such as accuracy. Parallel BO methods often adopt single manager/multiple workers strategies to evaluate multiple hyperparameter configurations simultaneously. Despite significant hyperparameter evaluation time, the overhead in such centralized schemes prevents these methods to scale on a large number of workers. We present an asynchronous-decentralized BO, wherein each worker runs a sequential BO and asynchronously communicates its results through shared storage. We scale our method without loss of computational efficiency with above 95% of worker's utilization to 1,920 parallel workers (full production queue of the Polaris supercomputer) and demonstrate improvement in model accuracy as well as faster convergence on the CANDLE benchmark from the Exascale computing project.
Learning by Doing: Controlling a Dynamical System using Causality, Control, and Reinforcement Learning
Weichwald, Sebastian, Mogensen, Sรธren Wengel, Lee, Tabitha Edith, Baumann, Dominik, Kroemer, Oliver, Guyon, Isabelle, Trimpe, Sebastian, Peters, Jonas, Pfister, Niklas
Questions in causality, control, and reinforcement learning go beyond the classical machine learning task of prediction under i.i.d. observations. Instead, these fields consider the problem of learning how to actively perturb a system to achieve a certain effect on a response variable. Arguably, they have complementary views on the problem: In control, one usually aims to first identify the system by excitation strategies to then apply model-based design techniques to control the system. In (non-model-based) reinforcement learning, one directly optimizes a reward. In causality, one focus is on identifiability of causal structure. We believe that combining the different views might create synergies and this competition is meant as a first step toward such synergies. The participants had access to observational and (offline) interventional data generated by dynamical systems. Track CHEM considers an open-loop problem in which a single impulse at the beginning of the dynamics can be set, while Track ROBO considers a closed-loop problem in which control variables can be set at each time step. The goal in both tracks is to infer controls that drive the system to a desired state. Code is open-sourced ( https://github.com/LearningByDoingCompetition/learningbydoing-comp ) to reproduce the winning solutions of the competition and to facilitate trying out new methods on the competition tasks.
Winning solutions and post-challenge analyses of the ChaLearn AutoDL challenge 2019
Liu, Zhengying, Pavao, Adrien, Xu, Zhen, Escalera, Sergio, Ferreira, Fabio, Guyon, Isabelle, Hong, Sirui, Hutter, Frank, Ji, Rongrong, Junior, Julio C. S. Jacques, Li, Ge, Lindauer, Marius, Luo, Zhipeng, Madadi, Meysam, Nierhoff, Thomas, Niu, Kangning, Pan, Chunguang, Stoll, Danny, Treguer, Sebastien, Wang, Jin, Wang, Peng, Wu, Chenglin, Xiong, Youcheng, Zela, Arbe r, Zhang, Yang
This paper reports the results and post-challenge analyses of ChaLearn's AutoDL challenge series, which helped sorting out a profusion of AutoML solutions for Deep Learning (DL) that had been introduced in a variety of settings, but lacked fair comparisons. All input data modalities (time series, images, videos, text, tabular) were formatted as tensors and all tasks were multi-label classification problems. Code submissions were executed on hidden tasks, with limited time and computational resources, pushing solutions that get results quickly. In this setting, DL methods dominated, though popular Neural Architecture Search (NAS) was impractical. Solutions relied on fine-tuned pre-trained networks, with architectures matching data modality. Post-challenge tests did not reveal improvements beyond the imposed time limit. While no component is particularly original or novel, a high level modular organization emerged featuring a "meta-learner", "data ingestor", "model selector", "model/learner", and "evaluator". This modularity enabled ablation studies, which revealed the importance of (off-platform) meta-learning, ensembling, and efficient data management. Experiments on heterogeneous module combinations further confirm the (local) optimality of the winning solutions. Our challenge legacy includes an ever-lasting benchmark (http://autodl.chalearn.org), the open-sourced code of the winners, and a free "AutoDL self-service".
Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020
Turner, Ryan, Eriksson, David, McCourt, Michael, Kiili, Juha, Laaksonen, Eero, Xu, Zhen, Guyon, Isabelle
This paper presents the results and insights from the black-box optimization (BBO) challenge at NeurIPS 2020 which ran from July-October, 2020. The challenge emphasized the importance of evaluating derivative-free optimizers for tuning the hyperparameters of machine learning models. This was the first black-box optimization challenge with a machine learning emphasis. It was based on tuning (validation set) performance of standard machine learning models on real datasets. This competition has widespread impact as black-box optimization (e.g., Bayesian optimization) is relevant for hyperparameter tuning in almost every machine learning project as well as many applications outside of machine learning. The final leaderboard was determined using the optimization performance on held-out (hidden) objective functions, where the optimizers ran without human intervention. Baselines were set using the default settings of several open-source black-box optimization packages as well as random search.
NeurIPS 2020 Competition: Predicting Generalization in Deep Learning
Jiang, Yiding, Foret, Pierre, Yak, Scott, Roy, Daniel M., Mobahi, Hossein, Dziugaite, Gintare Karolina, Bengio, Samy, Gunasekar, Suriya, Guyon, Isabelle, Neyshabur, Behnam
Understanding generalization in deep learning is arguably one of the most important questions in deep learning. Deep learning has been successfully adopted to a large number of problems ranging from pattern recognition to complex decision making, but many recent researchers have raised many concerns about deep learning, among which the most important is generalization. Despite numerous attempts, conventional statistical learning approaches have yet been able to provide a satisfactory explanation on why deep learning works. A recent line of works aims to address the problem by trying to predict the generalization performance through complexity measures. In this competition, we invite the community to propose complexity measures that can accurately predict generalization of models. A robust and general complexity measure would potentially lead to a better understanding of deep learning's underlying mechanism and behavior of deep models on unseen data, or shed light on better generalization bounds. All these outcomes will be important for making deep learning more robust and reliable.
AgEBO-Tabular: Joint Neural Architecture and Hyperparameter Search with Autotuned Data-Parallel Training for Tabular Data
Egele, Romain, Balaprakash, Prasanna, Vishwanath, Venkatram, Guyon, Isabelle, Liu, Zhengying
Developing high-performing predictive models for large tabular data sets is a challenging task. The state-of-the-art methods are based on expert-developed model ensembles from different supervised learning methods. Recently, automated machine learning (AutoML) is emerging as a promising approach to automate predictive model development. Neural architecture search (NAS) is an AutoML approach that generates and evaluates multiple neural network architectures concurrently and improves the accuracy of the generated models iteratively. A key issue in NAS, particularly for large data sets, is the large computation time required to evaluate each generated architecture. While data-parallel training is a promising approach that can address this issue, its use within NAS is difficult. For different data sets, the data-parallel training settings such as the number of parallel processes, learning rate, and batch size need to be adapted to achieve high accuracy and reduction in training time. To that end, we have developed AgEBO-Tabular, an approach to combine aging evolution (AgE), a parallel NAS method that searches over neural architecture space, and an asynchronous Bayesian optimization method for tuning the hyperparameters of the data-parallel training simultaneously. We demonstrate the efficacy of the proposed method to generate high-performing neural network models for large tabular benchmark data sets. Furthermore, we demonstrate that the automatically discovered neural network models using our method outperform the state-of-the-art AutoML ensemble models in inference speed by two orders of magnitude while reaching similar accuracy values.
LEAP nets for power grid perturbations
Donnot, Benjamin, Donon, Balthazar, Guyon, Isabelle, Liu, Zhengying, Marot, Antoine, Panciatici, Patrick, Schoenauer, Marc
We propose a novel neural network embedding approach to model power transmission grids, in which high voltage lines are disconnected and reconnected with one-another from time to time, either accidentally or willfully. We call our architecture LEAP net, for Latent Encoding of Atypical Perturbation. Our method implements a form of transfer learning, permitting to train on a few source domains, then generalize to new target domains, without learning on any example of that domain. We evaluate the viability of this technique to rapidly assess curative actions that human operators take in emergency situations, using real historical data, from the French high voltage power grid.Figure 1: Electricity is transported from production nodes (top) to consumption nodes (bottom), through lines (green and red edges) connected at substations (black circles), forming a transmission grid of a given topology ฯ . Injections x ( x 1, x 2, x 3, x 4) (production or consumption) add up to zero.
Towards AutoML in the presence of Drift: first results
Madrid, Jorge G., Escalante, Hugo Jair, Morales, Eduardo F., Tu, Wei-Wei, Yu, Yang, Sun-Hosoya, Lisheng, Guyon, Isabelle, Sebag, Michele
Research progress in AutoML has lead to state of the art solutions that can cope quite wellwith supervised learning task, e.g., classification with AutoSklearn. However, so far thesesystems do not take into account the changing nature of evolving data over time (i.e., theystill assume i.i.d. data); even when this sort of domains are increasingly available in realapplications (e.g., spam filtering, user preferences, etc.). We describe a first attempt to de-velop an AutoML solution for scenarios in which data distribution changes relatively slowlyover time and in which the problem is approached in a lifelong learning setting. We extendAuto-Sklearn with sound and intuitive mechanisms that allow it to cope with this sort ofproblems. The extended Auto-Sklearn is combined with concept drift detection techniquesthat allow it to automatically determine when the initial models have to be adapted. Wereport experimental results in benchmark data from AutoML competitions that adhere tothis scenario. Results demonstrate the effectiveness of the proposed methodology.
AutoML @ NeurIPS 2018 challenge: Design and Results
Escalante, Hugo Jair, Tu, Wei-Wei, Guyon, Isabelle, Silver, Daniel L., Viegas, Evelyne, Chen, Yuqiang, Dai, Wenyuan, Yang, Qiang
Machine learning has achieved great successes in online advertising, recommender systems, financial market analysis, computer vision, computational linguistics, bioinformatics and many other fields. However, its success crucially relies on human machine learning experts, as human experts are involved to some extent, in all systems design stages. In fact, it is still common for humans to take critical decisions in aspects like: converting a real world problem into a machine learning one, data gathering, formatting and preprocessing, feature engineering, selecting or designing model architectures, hyper-parameter tuning, assessment of model performance, deploying online ML systems, among others.
Anticipating contingengies in power grids using fast neural net screening
Donnot, Benjamin, Guyon, Isabelle, Schoenauer, Marc, Marot, Antoine, Panciatici, Patrick
We address the problem of maintaining high voltage power transmission networks in security at all time. This requires that power flowing through all lines remain below a certain nominal thermal limit above which lines might melt, break or cause other damages. Current practices include enforcing the deterministic "N-1" reliability criterion, namely anticipating exceeding of thermal limit for any eventual single line disconnection (whatever its cause may be) by running a slow, but accurate, physical grid simulator. New conceptual frameworks are calling for a probabilistic risk based security criterion and are in need of new methods to assess the risk. To tackle this difficult assessment, we address in this paper the problem of rapidly ranking higher order contingencies including all pairs of line disconnections, to better prioritize simulations. We present a novel method based on neural networks, which ranks "N-1" and "N-2" contingencies in decreasing order of presumed severity. We demonstrate on a classical benchmark problem that the residual risk of contingencies decreases dramatically compared to considering solely all "N-1" cases, at no additional computational cost. We evaluate that our method scales up to power grids of the size of the French high voltage power grid (over 1000 power lines).