Bayesian Learning
Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam
Khan, Mohammad Emtiyaz, Nielsen, Didrik, Tangkaratt, Voot, Lin, Wu, Gal, Yarin, Srivastava, Akash
Uncertainty computation in deep learning is essential to design robust and reliable systems. Variational inference (VI) is a promising approach for such computation, but requires more effort to implement and execute compared to maximum-likelihood methods. In this paper, we propose new natural-gradient algorithms to reduce such efforts for Gaussian mean-field VI. Our algorithms can be implemented within the Adam optimizer by perturbing the network weights during gradient evaluations, and uncertainty estimates can be cheaply obtained by using the vector that adapts the learning rate. This requires lower memory, computation, and implementation effort than existing VI methods, while obtaining uncertainty estimates of comparable quality. Our empirical results confirm this and further suggest that the weight-perturbation in our algorithm could be useful for exploration in reinforcement learning and stochastic optimization.
Combining Model-Free Q-Ensembles and Model-Based Approaches for Informed Exploration
Sankaranarayanan, Sreecharan, Annasamy, Raghuram Mandyam, Sycara, Katia, Rosรฉ, Carolyn Penstein
Q-Ensembles are a model-free approach where input images are fed into different Q-networks and exploration is driven by the assumption that uncertainty is proportional to the variance of the output Q-values obtained. They have been shown to perform relatively well compared to other exploration strategies. Further, model-based approaches, such as encoder-decoder models have been used successfully for next frame prediction given previous frames. This paper proposes to integrate the model-free Q-ensembles and model-based approaches with the hope of compounding the benefits of both and achieving superior exploration as a result. Results show that a model-based trajectory memory approach when combined with Q-ensembles produces superior performance when compared to only using Q-ensembles.
A One-Sided Classification Toolkit with Applications in the Analysis of Spectroscopy Data
This dissertation investigates the use of one-sided classification algorithms in the application of separating hazardous chlorinated solvents from other materials, based on their Raman spectra. The experimentation is carried out using a new one-sided classification toolkit that was designed and developed from the ground up. In the one-sided classification paradigm, the objective is to separate elements of the target class from all outliers. These one-sided classifiers are generally chosen, in practice, when there is a deficiency of some sort in the training examples. Sometimes outlier examples can be rare, expensive to label, or even entirely absent. However, this author would like to note that they can be equally applicable when outlier examples are plentiful but nonetheless not statistically representative of the complete outlier concept. It is this scenario that is explicitly dealt with in this research work. In these circumstances, one-sided classifiers have been found to be more robust that conventional multi-class classifiers. The term "unexpected" outliers is introduced to represent outlier examples, encountered in the test set, that have been taken from a different distribution to the training set examples. These are examples that are a result of an inadequate representation of all possible outliers in the training set. It can often be impossible to fully characterise outlier examples given the fact that they can represent the immeasurable quantity of "everything else" that is not a target. The findings from this research have shown the potential drawbacks of using conventional multi-class classification algorithms when the test data come from a completely different distribution to that of the training samples.
A Novel Bayesian Approach for Latent Variable Modeling from Mixed Data with Missing Values
Cui, Ruifei, Bucur, Ioan Gabriel, Groot, Perry, Heskes, Tom
We consider the problem of learning parameters of latent variable models from mixed (continuous and ordinal) data with missing values. We propose a novel Bayesian Gaussian copula factor (BGCF) approach that is consistent under certain conditions and that is quite robust to the violations of these conditions. In simulations, BGCF substantially outperforms two state-of-the-art alternative approaches. An illustration on the `Holzinger & Swineford 1939' dataset indicates that BGCF is favorable over the so-called robust maximum likelihood (MLR) even if the data match the assumptions of MLR.
Meta-Learning for Stochastic Gradient MCMC
Gong, Wenbo, Li, Yingzhen, Hernรกndez-Lobato, Josรฉ Miguel
Stochastic gradient Markov chain Monte Carlo (SG-MCMC) has become increasingly popular for simulating posterior samples in large-scale Bayesian modeling. However, existing SG-MCMC schemes are not tailored to any specific probabilistic model, even a simple modification of the underlying dynamical system requires significant physical intuition. This paper presents the first meta-learning algorithm that allows automated design for the underlying continuous dynamics of an SG-MCMC sampler. The learned sampler generalizes Hamiltonian dynamics with state-dependent drift and diffusion, enabling fast traversal and efficient exploration of neural network energy landscapes. Experiments validate the proposed approach on both Bayesian fully connected neural network and Bayesian recurrent neural network tasks, showing that the learned sampler out-performs generic, hand-designed SG-MCMC algorithms, and generalizes to different datasets and larger architectures.
Partial AUC Maximization via Nonlinear Scoring Functions
Ueda, Naonori, Fujino, Akinori
We propose a method for maximizing a partial area under a receiver operating characteristic (ROC) curve (pAUC) for binary classification tasks. In binary classification tasks, accuracy is the most commonly used as a measure of classifier performance. In some applications such as anomaly detection and diagnostic testing, accuracy is not an appropriate measure since prior probabilties are often greatly biased. Although in such cases the pAUC has been utilized as a performance measure, few methods have been proposed for directly maximizing the pAUC. This optimization is achieved by using a scoring function. The conventional approach utilizes a linear function as the scoring function. In contrast we newly introduce nonlinear scoring functions for this purpose. Specifically, we present two types of nonlinear scoring functions based on generative models and deep neural networks. We show experimentally that nonlinear scoring fucntions improve the conventional methods through the application of a binary classification of real and bogus objects obtained with the Hyper Suprime-Cam on the Subaru telescope.
Trading algorithms with learning in latent alpha models
Casgrain, Philippe, Jaimungal, Sebastian
Alpha signals for statistical arbitrage strategies are often driven by latent factors. This paper analyses how to optimally trade with latent factors that cause prices to jump and diffuse. Moreover, we account for the effect of the trader's actions on quoted prices and the prices they receive from trading. Under fairly general assumptions, we demonstrate how the trader can learn the posterior distribution over the latent states, and explicitly solve the latent optimal trading problem. We provide a verification theorem, and a methodology for calibrating the model by deriving a variation of the expectation-maximization algorithm. To illustrate the efficacy of the optimal strategy, we demonstrate its performance through simulations and compare it to strategies which ignore learning in the latent factors. We also provide calibration results for a particular model using Intel Corporation stock as an example.
Disintegration and Bayesian Inversion via String Diagrams
The notions of disintegration and Bayesian inversion are fundamental in conditional probability theory. They produce channels, as conditional probabilities, from a joint state, or from an already given channel (in opposite direction). These notions exist in the literature, in concrete situations, but are presented here in abstract graphical formulations. The resulting abstract descriptions are used for proving basic results in conditional probability theory. The existence of disintegration and Bayesian inversion is discussed for discrete probability, and also for measure-theoretic probability --- via standard Borel spaces and via likelihoods. Finally, the usefulness of disintegration and Bayesian inversion is illustrated in several examples.
ML + FV = $\heartsuit$? A Survey on the Application of Machine Learning to Formal Verification
Amrani, Moussa, Lรบcio, Levi, Bibal, Adrien
Formal Verification (Fv) and Machine Learning (Ml) can seem incompatible due to their opposite mathematical foundations and their use in real-life problems: Fv mostly relies on discrete mathematics and aims at ensuring correctness; Ml often relies on probabilistic models and consists of learning patterns from training data. In this paper, we postulate that they are complementary in practice, and explore how Ml helps Fv in its classical approaches: static analysis, model-checking, theorem-proving, and Sat solving. We draw a landscape of the current practice and catalog some of the most prominent uses of Ml inside Fv tools, thus offering a new perspective on Fv techniques that can help researchers and practitioners to better locate the possible synergies. We discuss lessons learned from our work, point to possible improvements and offer visions for the future of the domain in the light of the science of software and systems modeling.
Haskell: Data Analysis Made Easy Udemy
A staggering amount of data is created everyday; analyzing and organizing this enormous amount of data can be quite a complex task. Haskell is a powerful and well-designed functional programming language that is designed to work with complex data. It is trending in the field of data science as it provides a powerful platform for robust data science practices. This course will introduce the basic concepts of Haskell and move on to discuss how Haskell can be used to solve the issues by using the real-world data. The course will guide you through the installation procedure, after you have all the tools that you require in place, you will explore the basic concepts of Haskell including the functions, and the data structures.