Meijer, Erik
Coarsening Optimization for Differentiable Programming
Shen, Xipeng, Zhang, Guoqiang, Dea, Irene, Andow, Samantha, Arroyo-Fang, Emilio, Gafter, Neal, George, Johann, Grueter, Melissa, Meijer, Erik, Stumpos, Steffi, Tempest, Alanna, Warden, Christy, Yang, Shannon
A program written with differentiable programming can be differentiated automatically. The differentiation results can then be used for gradient-based optimization (e.g., gradient descent) of the parameters in the program. Differentiable programming have been used in scientific computing, physics simulations, and other domains to help mitigate the burden of manual error-prone coding of derivative computations. Recent several years have witnessed a growing interest of differentiable programming in machine learning (ML) [11, 34] and Probabilistic Programming [30], to accommodate the needs of various customized ML operators, user-defined operations in the learning targets (e.g., the physical environment of reinforcement learning) and statistical sampling. The key technique in differentiable programming is automatic differentiation. For a program (P) that produces output (y) from some given values (X), automatic differentiation automatically computes the derivatives ( y/ x) (x X) without the need for users to write the differentiation code. The given program P is called the primal code, and x is called an active input variable. Existing approaches of automatic differentiation fall into two categories: (i) Symbolic differentiation, which uses expression manipulation in computer algebra systems, (ii) Algorithmic differentiation, which performs a non-standard interpretation of a given computer program by replacing the domain of the variables to incorporate derivative values and redefining the semantics of the operators to propagate derivatives per the chain rule of differential calculus (elaborated in Section 2). Symbolic differentiation has been commonly regarded inappropriate for differentiable programming, for several reasons: (i) It results in complex and cryptic expressions plagued with the problem of "expression swell" [5].
Localized Uncertainty Attacks
Dia, Ousmane Amadou, Karaletsos, Theofanis, Hazirbas, Caner, Ferrer, Cristian Canton, Kabul, Ilknur Kaynar, Meijer, Erik
The susceptibility of deep learning models to adversarial perturbations has stirred renewed attention in adversarial examples resulting in a number of attacks. However, most of these attacks fail to encompass a large spectrum of adversarial perturbations that are imperceptible to humans. In this paper, we present localized uncertainty attacks, a novel class of threat models against deterministic and stochastic classifiers. Under this threat model, we create adversarial examples by perturbing only regions in the inputs where a classifier is uncertain. To find such regions, we utilize the predictive uncertainty of the classifier when the classifier is stochastic or, we learn a surrogate model to amortize the uncertainty when it is deterministic. Unlike $\ell_p$ ball or functional attacks which perturb inputs indiscriminately, our targeted changes can be less perceptible. When considered under our threat model, these attacks still produce strong adversarial examples; with the examples retaining a greater degree of similarity with the inputs.
Accelerating Metropolis-Hastings with Lightweight Inference Compilation
Liang, Feynman, Arora, Nimar, Tehrani, Nazanin, Li, Yucen, Tingley, Michael, Meijer, Erik
In order to construct accurate proposers for Metropolis-Hastings Markov Chain Monte Carlo, we integrate ideas from probabilistic graphical models and neural networks in an open-source framework we call Lightweight Inference Compilation (LIC). LIC implements amortized inference within an open-universe declarative probabilistic programming language (PPL). Graph neural networks are used to parameterize proposal distributions as functions of Markov blankets, which during "compilation" are optimized to approximate single-site Gibbs sampling distributions. Unlike prior work in inference compilation (IC), LIC forgoes importance sampling of linear execution traces in favor of operating directly on Bayesian networks. Through using a declarative PPL, the Markov blankets of nodes (which may be non-static) are queried at inference-time to produce proposers Experimental results show LIC can produce proposers which have less parameters, greater robustness to nuisance random variables, and improved posterior sampling in a Bayesian logistic regression and $n$-schools inference application.
Gradient Descent: The Ultimate Optimizer
Chandra, Kartik, Meijer, Erik, Andow, Samantha, Arroyo-Fang, Emilio, Dea, Irene, George, Johann, Grueter, Melissa, Hosmer, Basil, Stumpos, Steffi, Tempest, Alanna, Yang, Shannon
Working with any gradient-based machine learning algorithm involves the tedious task of tuning the optimizer's hyperparameters, such as the learning rate. There exist many techniques for automated hyperparameter optimization, but they typically introduce even more hyperparameters to control the hyperparameter optimization process. We propose to instead learn the hyperparameters themselves by gradient descent, and furthermore to learn the hyper-hyperparameters by gradient descent as well, and so on ad infinitum. As these towers of gradient-based optimizers grow, they become significantly less sensitive to the choice of top-level hyperparameters, hence decreasing the burden on the user to search for optimal values.
SysML: The New Frontier of Machine Learning Systems
Ratner, Alexander, Alistarh, Dan, Alonso, Gustavo, Andersen, David G., Bailis, Peter, Bird, Sarah, Carlini, Nicholas, Catanzaro, Bryan, Chayes, Jennifer, Chung, Eric, Dally, Bill, Dean, Jeff, Dhillon, Inderjit S., Dimakis, Alexandros, Dubey, Pradeep, Elkan, Charles, Fursin, Grigori, Ganger, Gregory R., Getoor, Lise, Gibbons, Phillip B., Gibson, Garth A., Gonzalez, Joseph E., Gottschlich, Justin, Han, Song, Hazelwood, Kim, Huang, Furong, Jaggi, Martin, Jamieson, Kevin, Jordan, Michael I., Joshi, Gauri, Khalaf, Rania, Knight, Jason, Konečný, Jakub, Kraska, Tim, Kumar, Arun, Kyrillidis, Anastasios, Lakshmiratan, Aparna, Li, Jing, Madden, Samuel, McMahan, H. Brendan, Meijer, Erik, Mitliagkas, Ioannis, Monga, Rajat, Murray, Derek, Olukotun, Kunle, Papailiopoulos, Dimitris, Pekhimenko, Gennady, Rekatsinas, Theodoros, Rostamizadeh, Afshin, Ré, Christopher, De Sa, Christopher, Sedghi, Hanie, Sen, Siddhartha, Smith, Virginia, Smola, Alex, Song, Dawn, Sparks, Evan, Stoica, Ion, Sze, Vivienne, Udell, Madeleine, Vanschoren, Joaquin, Venkataraman, Shivaram, Vinayak, Rashmi, Weimer, Markus, Wilson, Andrew Gordon, Xing, Eric, Zaharia, Matei, Zhang, Ce, Talwalkar, Ameet
Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a new systems machine learning research community at the intersection of the traditional systems and ML communities, focused on topics such as hardware systems for ML, software systems for ML, and ML optimized for metrics beyond predictive accuracy. To do this, we describe a new conference, SysML, that explicitly targets research at the intersection of systems and machine learning with a program committee split evenly between experts in systems and ML, and an explicit focus on topics at the intersection of the two.