Energy
Mat\'ern Gaussian processes on Riemannian manifolds
Borovitskiy, Viacheslav, Terenin, Alexander, Mostowsky, Peter, Deisenroth, Marc Peter
Gaussian processes are an effective model class for learning unknown functions, particularly in settings where accurately representing predictive uncertainty is of key importance. Motivated by applications in the physical sciences, the widely-used Mat\'ern class of Gaussian processes has recently been generalized to model functions whose domains are Riemannian manifolds, by re-expressing said processes as solutions of stochastic partial differential equations. In this work, we propose techniques for computing the kernels of these processes on compact Riemannian manifolds via spectral theory of the Laplace-Beltrami operator in a fully constructive manner, thereby allowing them to be trained via standard scalable techniques such as inducing point methods. We also extend the generalization from the Mat\'ern to the widely-used squared exponential Gaussian process. By allowing Riemannian Mat\'ern Gaussian processes to be trained using well-understood techniques, our work enables their use in mini-batch, online, and non-conjugate settings, and makes them more accessible to machine learning practitioners.
Automatically Learning Compact Quality-aware Surrogates for Optimization Problems
Wang, Kai, Wilder, Bryan, Perrault, Andrew, Tambe, Milind
Solving optimization problems with unknown parameters often requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values. Recent work has shown that including the optimization problem as a layer in the model training pipeline results in predictions of the unobserved parameters that lead to higher decision quality. Unfortunately, this process comes at a large computational cost because the optimization problem must be solved and differentiated through in each training iteration; furthermore, it may also sometimes fail to improve solution quality due to non-smoothness issues that arise when training through a complex optimization layer. To address these shortcomings, we learn a low-dimensional surrogate model of a large optimization problem by representing the feasible space in terms of meta-variables, each of which is a linear combination of the original variables. By training a low-dimensional surrogate model end-to-end, and jointly with the predictive model, we achieve: i) a large reduction in training and inference time; and ii) improved performance by focusing attention on the more important variables in the optimization and learning in a smoother space. Empirically, we demonstrate these improvements on a non-convex adversary modeling task, a submodular recommendation task and a convex portfolio optimization task.
Kernel Smoothing, Mean Shift, and Their Learning Theory with Directional Data
Directional data consist of observations distributed on a (hyper)sphere, and appear in many applied fields, such as astronomy, ecology, and environmental science. This paper studies both statistical and computational problems of kernel smoothing for directional data. We generalize the classical mean shift algorithm to directional data, which allows us to identify local modes of the directional kernel density estimator (KDE). The statistical convergence rates of the directional KDE and its derivatives are derived, and the problem of mode estimation is examined. We also prove the ascending property of our directional mean shift algorithm and investigate a general problem of gradient ascent on the unit hypersphere. To demonstrate the applicability of our proposed algorithm, we evaluate it as a mode clustering method on both simulated and real-world datasets.
BRP-NAS: Prediction-based NAS using GCNs
Dudziak, Łukasz, Chau, Thomas, Abdelfattah, Mohamed S., Lee, Royson, Kim, Hyeji, Lane, Nicholas D.
Neural architecture search (NAS) enables researchers to automatically explore broad design spaces in order to improve efficiency of neural networks. This efficiency is especially important in the case of on-device deployment, where improvements in accuracy should be balanced out with computational demands of a model. In practice, performance metrics of model are computationally expensive to obtain. Previous work uses a proxy (e.g., number of operations) or a layer-wise measurement of neural network layers to estimate end-to-end hardware performance but the imprecise prediction diminishes the quality of NAS. To address this problem, we propose BRP-NAS, an efficient hardware-aware NAS enabled by an accurate performance predictor-based on graph convolutional network (GCN). What is more, we investigate prediction quality on different metrics and show that sample efficiency of the predictor-based NAS can be improved by considering binary relations of models and an iterative data selection strategy. We show that our proposed method outperforms all prior methods on NAS-Bench-101, NAS-Bench-201 and DARTS. Finally, to raise awareness of the fact that accurate latency estimation is not a trivial task, we release LatBench -- a latency dataset of NAS-Bench-201 models running on a broad range of devices.
Probabilistic Auto-Encoder
We introduce the probabilistic auto-encoder (PAE), a generative model with a lower dimensional latent space that is based on an auto-encoder which is interpreted probabilistically after training using a normalizing flow. The PAE is fast and easy to train, achieves small reconstruction errors, high sample quality and good performance in downstream tasks. Compared to a VAE and its common variants, the PAE trains faster, reaches a lower reconstruction error and produces state of the art sample quality without requiring special tuning parameters or training procedures. We further demonstrate that the PAE is a powerful model for performing the downstream tasks of outlier detection and probabilistic image reconstruction: 1) We identify a PAE-based outlier detection metric which achieves state of the art results and outperforms other likelihood based estimators. 2) We perform high dimensional data inpainting and denoising with uncertainty quantification by means of posterior analysis in the PAE latent space. Most generative models are specifically tuned to excel in one or two applications. With the PAE we introduce an easy-to-train, simple, but at the same time powerful model that performs well and reliably in many tasks without requiring special fine-tuning or training procedures. We make all PAE codes publicly available.
Online Time-Varying Topology Identification via Prediction-Correction Algorithms
Natali, Alberto, Coutino, Mario, Isufi, Elvin, Leus, Geert
Signal processing and machine learning algorithms for data supported over graphs, require the knowledge of the graph topology. Unless this information is given by the physics of the problem (e.g., water supply networks, power grids), the topology has to be learned from data. Topology identification is a challenging task, as the problem is often ill-posed, and becomes even harder when the graph structure is time-varying. In this paper, we address the problem of dynamic topology identification by building on recent results from time-varying optimization, devising a general-purpose online algorithm operating in non-stationary environments. Because of its iteration-constrained nature, the proposed approach exhibits an intrinsic temporal-regularization of the graph topology without explicitly enforcing it. As a case-study, we specialize our method to the Gaussian graphical model (GGM) problem and corroborate its performance.
Variance reduction for Random Coordinate Descent-Langevin Monte Carlo
Sampling from a log-concave distribution function is one core problem that has wide applications in Bayesian statistics and machine learning. While most gradient free methods have slow convergence rate, the Langevin Monte Carlo (LMC) that provides fast convergence requires the computation of gradients. In practice one uses finite-differencing approximations as surrogates, and the method is expensive in high-dimensions. A natural strategy to reduce computational cost in each iteration is to utilize random gradient approximations, such as random coordinate descent (RCD) or simultaneous perturbation stochastic approximation (SPSA). We show by a counter-example that blindly applying RCD does not achieve the goal in the most general setting. The high variance induced by the randomness means a larger number of iterations are needed, and this balances out the saving in each iteration. We then introduce a new variance reduction approach, termed Randomized Coordinates Averaging Descent (RCAD), and incorporate it with both overdamped and underdamped LMC. The methods are termed RCAD-O-LMC and RCAD-U-LMC respectively. The methods still sit in the random gradient approximation framework, and thus the computational cost in each iteration is low. However, by employing RCAD, the variance is reduced, so the methods converge within the same number of iterations as the classical overdamped and underdamped LMC. This leads to a computational saving overall.
Reservoir Computing meets Recurrent Kernels and Structured Transforms
Dong, Jonathan, Ohana, Ruben, Rafayelyan, Mushegh, Krzakala, Florent
Reservoir Computing is a class of simple yet efficient Recurrent Neural Networks where internal weights are fixed at random and only a linear output layer is trained. In the large size limit, such random neural networks have a deep connection with kernel methods. Our contributions are threefold: a) We rigorously establish the recurrent kernel limit of Reservoir Computing and prove its convergence. b) We test our models on chaotic time series prediction, a classic but challenging benchmark in Reservoir Computing, and show how the Recurrent Kernel is competitive and computationally efficient when the number of data points remains moderate. c) When the number of samples is too large, we leverage the success of structured Random Features for kernel approximation by introducing Structured Reservoir Computing. The two proposed methods, Recurrent Kernel and Structured Reservoir Computing, turn out to be much faster and more memory-efficient than conventional Reservoir Computing.
Parameter Estimation using Neural Networks in the Presence of Detector Effects
Andreassen, Anders, Hsu, Shih-Chieh, Nachman, Benjamin, Suaysom, Natchanon, Suresh, Adi
Histogram-based template fits are the main technique used for estimating parameters of high energy physics Monte Carlo generators. Parameterized neural network reweighting can be used to extend this fitting procedure to many dimensions and does not require binning. If the fit is to be performed using reconstructed data, then expensive detector simulations must be used for training the neural networks. We introduce a new two-level fitting approach that only requires one dataset with detector simulation and then a set of additional generation-level datasets without detector effects included. This Simulation-level fit based on Reweighting Generator-level events with Neural networks (SRGN) is demonstrated using simulated datasets for a variety of examples including a simple Gaussian random variable, parton shower tuning, and the top quark mass extraction.