Country
Normalizing Flows on Tori and Spheres
Rezende, Danilo Jimenez, Papamakarios, George, Racanière, Sébastien, Albergo, Michael S., Kanwar, Gurtej, Shanahan, Phiala E., Cranmer, Kyle
Normalizing flows are a powerful tool for building expressive distributions in high dimensions. So far, most of the literature has concentrated on learning flows on Euclidean spaces. Some problems however, such as those involving angles, are defined on spaces with more complex geometries, such as tori or spheres. In this paper, we propose and compare expressive and numerically stable flows on such spaces. Our flows are built recursively on the dimension of the space, starting from flows on circles, closed intervals or spheres.
How Good is the Bayes Posterior in Deep Neural Networks Really?
Wenzel, Florian, Roth, Kevin, Veeling, Bastiaan S., Świątkowski, Jakub, Tran, Linh, Mandt, Stephan, Snoek, Jasper, Salimans, Tim, Jenatton, Rodolphe, Nowozin, Sebastian
During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are---as of early 2020---no publicized deployments of Bayesian neural networks in industrial practice. In this work we cast doubt on the current understanding of Bayes posteriors in popular deep neural networks: we demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD. Furthermore, we demonstrate that predictive performance is improved significantly through the use of a "cold posterior" that overcounts evidence. Such cold posteriors sharply deviate from the Bayesian paradigm but are commonly used as heuristic in Bayesian deep learning papers. We put forward several hypotheses that could explain cold posteriors and evaluate the hypotheses through experiments. Our work questions the goal of accurate posterior approximations in Bayesian deep learning: If the true Bayes posterior is poor, what is the use of more accurate approximations? Instead, we argue that it is timely to focus on understanding the origin of the improved performance of cold posteriors.
Regret analysis of the Piyavskii-Shubert algorithm for global Lipschitz optimization
Bouttier, Clément, Cesari, Tommaso, Gerchinovitz, Sébastien
The goalof online optimization is to find an approximatemaximizer of a given function f: D R d R with as little evaluations off as possible. In this paper we assumethat the only accessto the function f is through an oracle returning the (possibly) perturbed values of the function at the queried points. Perturbations can be deterministic or stochastic. No analytical expression of f or any of its derivatives is available. At each round k the learner picks a new point x k D and the value f(x k) is revealed by the oracle, up to an additive perturbation ξ k . After each evaluation, the learner can return a point x k D, which may differ from the last queried point x k .
Product Kanerva Machines: Factorized Bayesian Memory
Marblestone, Adam, Wu, Yan, Wayne, Greg
An ideal cognitively-inspired memory system would compress and organize incoming items. The Kanerva Machine (Wu et al., 2018b;a) is a Bayesian model that naturally implements online memory compression. However, the organization of the Kanerva Machine is limited by its use of a single Gaussian random matrix for storage. Here we introduce the Product Kanerva Machine, which dynamically combines many smaller Kanerva Machines. Its hierarchical structure provides a principled way to abstract invariant features and gives scaling and capacity advantages over single Kanerva Machines. We show that it can exhibit unsupervised clustering, find sparse and combinatorial allocation patterns, and discover spatial tunings that approximately factorize simple images by object.
End-to-End Models for the Analysis of Pupil Size Variations and Diagnosis of Parkinson's Disease
Zanca, Dario, Rufa, Alessandra, Canessa, Andrea, Sabatini, Silvio
It is well known that a systematic analysis of the pupil size variations, recorded by means of an eye-tracker, is a rich source of information about a subject's cognitive state. In this work we present end-to-end models for the diagnosis of Parkinson's disease (PD) based on the raw pupil size signal. Long-range registration (10 minutes) of the pupil size were collected in scotopic conditions (complete darkness, 0 lux) on 21 healthy subjects and 15 subjects diagnosed with PD. 1-D convolutional neural network models are trained for classification of short-range sequences (10 to 60 seconds of registration). The model provides prediction with high average accuracy on a hold out test set. A temporal analysis of the model performance allowed the characterization of pupil's size variations in PD and healthy subjects during a resting state. Dataset and codes are released for reproducibility and benchmarking purposes.
Macroscopic Traffic Flow Modeling with Physics Regularized Gaussian Process: A New Insight into Machine Learning Applications
Yuan, Yun, Yang, Xianfeng Terry, Zhang, Zhao, Zhe, Shandian
Despite the wide implementation of machine learning (ML) techniques in traffic flow modeling recently, those data-driven approaches often fall short of accuracy in the cases with a small or noisy dataset. To address this issue, this study presents a new modeling framework, named physics regularized machine learning (PRML), to encode classical traffic flow models (referred as physical models) into the ML architecture and to regularize the ML training process. More specifically, a stochastic physics regularized Gaussian process (PRGP) model is developed and a Bayesian inference algorithm is used to estimate the mean and kernel of the PRGP. A physical regularizer based on macroscopic traffic flow models is also developed to augment the estimation via a shadow GP and an enhanced latent force model is used to encode physical knowledge into stochastic processes. Based on the posterior regularization inference framework, an efficient stochastic optimization algorithm is also developed to maximize the evidence lowerbound of the system likelihood. To prove the effectiveness of the proposed model, this paper conducts empirical studies on a real-world dataset which is collected from a stretch of I-15 freeway, Utah. Results show the new PRGP model can outperform the previous compatible methods, such as calibrated pure physical models and pure machine learning methods, in estimation precision and input robustness.
Value of Information Analysis via Active Learning and Knowledge Sharing in Error-Controlled Adaptive Kriging
Zhang, Chi, Wang, Zeyu, Shafieezadeh, Abdollah
Large uncertainties in many phenomena of interest have challenged the reliability of pertaining decisions. Collecting additional information to better characterize involved uncertainties is among decision alternatives. Value of information (VoI) analysis is a mathematical decision framework that quantifies expected potential benefits of new data and assists with optimal allocation of resources for information collection. However, a primary challenge facing VoI analysis is the very high computational cost of the underlying Bayesian inference especially for equality-type information. This paper proposes the first surrogate-based framework for VoI analysis. Instead of modeling the limit state functions describing events of interest for decision making, which is commonly pursued in surrogate model-based reliability methods, the proposed framework models system responses. This approach affords sharing equality-type information from observations among surrogate models to update likelihoods of multiple events of interest. Moreover, two knowledge sharing schemes called model and training points sharing are proposed to most effectively take advantage of the knowledge offered by costly model evaluations. Both schemes are integrated with an error rate-based adaptive training approach to efficiently generate accurate Kriging surrogate models. The proposed VoI analysis framework is applied for an optimal decision-making problem involving load testing of a truss bridge. While state-of-the-art methods based on importance sampling and adaptive Kriging Monte Carlo simulation are unable to solve this problem, the proposed method is shown to offer accurate and robust estimates of VoI with a limited number of model evaluations. Therefore, the proposed method facilitates the application of VoI for complex decision problems.
Near-optimal Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms for the Non-episodic Setting
We study reinforcement learning in factored Markov decision processes (FMDPs) in the non-episodic setting. We focus on regret analyses providing both upper and lower bounds. We propose two near-optimal and oracle-efficient algorithms for FMDPs. Assuming oracle access to an FMDP planner, they enjoy a Bayesian and a frequentist regret bound respectively, both of which reduce to the near-optimal bound $\widetilde{O}(DS\sqrt{AT})$ for standard non-factored MDPs. Our lower bound depends on the span of the bias vector rather than the diameter $D$ and we show via a simple Cartesian product construction that FMDPs with a bounded span can have an arbitrarily large diameter, which suggests that bounds with a dependence on diameter can be extremely loose. We, therefore, propose another algorithm that only depends on span but relies on a computationally stronger oracle. Our algorithms outperform the previous near-optimal algorithms on computer network administrator simulations.
Using generative adversarial networks to synthesize artificial financial datasets
Efimov, Dmitry, Xu, Di, Kong, Luyang, Nefedov, Alexey, Anandakrishnan, Archana
Generative Adversarial Networks (GANs) became very popular for generation of realistically looking images. In this paper, we propose to use GANs to synthesize artificial financial data for research and benchmarking purposes. We test this approach on three American Express datasets, and show that properly trained GANs can replicate these datasets with high fidelity. For our experiments, we define a novel type of GAN, and suggest methods for data preprocessing that allow good training and testing performance of GANs. We also discuss methods for evaluating the quality of generated data, and their comparison with the original real data.
AI-GAN: Attack-Inspired Generation of Adversarial Examples
Bai, Tao, Zhao, Jun, Zhu, Jinlin, Han, Shoudong, Chen, Jiefeng, Li, Bo
Adversarial examples that can fool deep models are mainly crafted by adding small perturbations imperceptible to human eyes. There are various optimization-based methods in the literature to generate adversarial perturbations, most of which are time-consuming. AdvGAN, a method proposed by Xiao~\emph{et al.}~in IJCAI~2018, employs Generative Adversarial Networks (GAN) to generate adversarial perturbation with original images as inputs, which is faster than optimization-based methods at inference time. AdvGAN, however, fixes the target classes in the training and we find it difficult to train AdvGAN when it is modified to take original images and target classes as inputs. In this paper, we propose \mbox{Attack-Inspired} GAN (\mbox{AI-GAN}) with a different training strategy to solve this problem. \mbox{AI-GAN} is a two-stage method, in which we use projected gradient descent (PGD) attack to inspire the training of GAN in the first stage and apply standard training of GAN in the second stage. Once trained, the Generator can approximate the conditional distribution of adversarial instances and generate \mbox{imperceptible} adversarial perturbations given different target classes. We conduct experiments and evaluate the performance of \mbox{AI-GAN} on MNIST and \mbox{CIFAR-10}. Compared with AdvGAN, \mbox{AI-GAN} achieves higher attack success rates with similar perturbation magnitudes.