Directed Networks
Statistical Inference in Mean-Field Variational Bayes
In variational inference, the complicated target is approximated by a closest member relative to the Kullback-Leibler (KL) divergence in a pre-specified family of tractable densities. In many large-scale machine learning applications including clustering problems [11, 32], image classification [25, 27] and topic models [21, 7], variational inference can be orders of magnitude faster than the traditional sampling based approaches such as Markov Chain Monte Carlo (MCMC). In particular, by turning the integration, or sampling, problem into an optimization problem, variational inference can take advantage of modern optimization tools such as stochastic optimization techniques [20, 17] and distributed optimization architecture [1, 8] for further improving its efficiency. Among various approximating schemes, mean-field approximation is the most common type of variational inference that is conceptually simple, implementation-wise easy and particularly suitable for problems involving large numbers of latent variables. The word "mean-field" is originated from the mean-field theory in physics where despite complex interactions among many particles in a many (infinite) body system, all interactions to any one particle can be approximated by a single averaged effect from a "mean-field". In variational inference, by restricting the approximating family of the mean-field to be all density functions that are fully factorized over (blocks of) unknown variables, the associated optimization problem of finding a closest weih2@illinois.edu
Probabilistic Super-Resolution of Solar Magnetograms: Generating Many Explanations and Measuring Uncertainties
Gitiaux, Xavier, Maloney, Shane A., Jungbluth, Anna, Shneider, Carl, Wright, Paul J., Baydin, Atฤฑlฤฑm Gรผneล, Deudon, Michel, Gal, Yarin, Kalaitzis, Alfredo, Muรฑoz-Jaramillo, Andrรฉs
Machine learning techniques have been successfully applied to super-resolution tasks on natural images where visually pleasing results are sufficient. However in many scientific domains this is not adequate and estimations of errors and uncertainties are crucial. To address this issue we propose a Bayesian framework that decomposes uncertainties into epistemic and aleatoric uncertainties. We test the validity of our approach by super-resolving images of the Sun's magnetic field and by generating maps measuring the range of possible high resolution explanations compatible with a given low resolution magnetogram.
Voice Biometrics Security: Extrapolating False Alarm Rate via Hierarchical Bayesian Modeling of Speaker Verification Scores
Sholokhov, Alexey, Kinnunen, Tomi, Vestman, Ville, Lee, Kong Aik
How secure automatic speaker verification (ASV) technology is? More concretely, given a specific target speaker, how likely is it to find another person who gets falsely accepted as that target? This question may be addressed empirically by studying naturally confusable pairs of speakers within a large enough corpus. To this end, one might expect to find at least some speaker pairs that are indistinguishable from each other in terms of ASV. To a certain extent, such aim is mirrored in the standardized ASV evaluation benchmarks. However, the number of speakers in such evaluation benchmarks represents only a small fraction of all possible human voices, making it challenging to extrapolate performance beyond a given corpus. Furthermore, the impostors used in performance evaluation are usually selected randomly. A potentially more meaningful definition of an impostor - at least in the context of security-driven ASV applications - would be closest (most confusable) other speaker to a given target. We put forward a novel performance assessment framework to address both the inadequacy of the random-impostor evaluation model and the size limitation of evaluation corpora by addressing ASV security against closest impostors on arbitrarily large datasets. The framework allows one to make a prediction of the safety of given ASV technology, in its current state, for arbitrarily large speaker database size consisting of virtual (sampled) speakers. As a proof-of-concept, we analyze the performance of two state-of-the-art ASV systems, based on i-vector and x-vector speaker embeddings (as implemented in the popular Kaldi toolkit), on the recent VoxCeleb 1 & 2 corpora. We found that neither the i-vector or x-vector system is immune to increased false alarm rate at increased impostor database size.
Auditing and Achieving Intersectional Fairness in Classification Problems
Morina, Giulio, Oliinyk, Viktoriia, Waton, Julian, Marusic, Ines, Georgatzis, Konstantinos
Machine learning algorithms are extensively used to make increasingly more consequential decisions, so that achieving optimal predictive performance can no longer be the only focus. This paper explores intersectional fairness, that is fairness when intersections of multiple sensitive attributes -- such as race, age, nationality, etc. -- are considered. Previous research has mainly been focusing on fairness with respect to a single sensitive attribute, with intersectional fairness being comparatively less studied despite its critical importance for modern machine learning applications. We introduce intersectional fairness metrics by extending prior work, and provide different methodologies to audit discrimination in a given dataset or model outputs. Secondly, we develop novel post-processing techniques to mitigate any detected bias in a classification model. Our proposed methodology does not rely on any assumptions regarding the underlying model and aims at guaranteeing fairness while preserving good predictive performance. Finally, we give guidance on a practical implementation, showing how the proposed methods perform on a real-world dataset.
A Gentle Introduction to Monte Carlo Sampling for Probability
Monte Carlo methods are a class of techniques for randomly sampling a probability distribution. There are many problem domains where describing or estimating the probability distribution is relatively straightforward, but calculating a desired quantity is intractable. This may be due to many reasons, such as the stochastic nature of the domain or an exponential number of random variables. Instead, a desired quantity can be approximated by using random sampling, referred to as Monte Carlo methods. These methods were initially used around the time that the first computers were created and remain pervasive through all fields of science and engineering, including artificial intelligence and machine learning.
Mean-field inference methods for neural networks
Machine learning algorithms relying on deep neural networks recently allowed a great leap forward in artificial intelligence. Despite the popularity of their applications, the efficiency of these algorithms remains largely unexplained from a theoretical point of view. The mathematical description of learning problems involves very large collections of interacting random variables, difficult to handle analytically as well as numerically. This complexity is precisely the object of study of statistical physics. Its mission, originally pointed towards natural systems, is to understand how macroscopic behaviors arise from microscopic laws. Mean-field methods are one type of approximation strategy developed in this view. We review a selection of classical mean-field methods and recent progress relevant for inference in neural networks. In particular, we remind the principles of derivations of high-temperature expansions, the replica method and message passing algorithms, highlighting their equivalences and complementarities. We also provide references for past and current directions of research on neural networks relying on mean-field methods.
Multiple Futures Prediction
Tang, Yichuan Charlie, Salakhutdinov, Ruslan
Temporal prediction is critical for making intelligent and robust decisions in complex dynamic environments. Motion prediction needs to model the inherently uncertain future which often contains multiple potential outcomes, due to multi-agent interactions and the latent goals of others. Towards these goals, we introduce a probabilistic framework that efficiently learns latent variables to jointly model the multi-step future motions of agents in a scene. Our framework is data-driven and learns semantically meaningful latent variables to represent the multimodal future, without requiring explicit labels. Using a dynamic attention-based state encoder, we learn to encode the past as well as the future interactions among agents, efficiently scaling to any number of agents. Finally, our model can be used for planning via computing a conditional probability density over the trajectories of other agents given a hypothetical rollout of the 'self' agent. We demonstrate our algorithms by predicting vehicle trajectories of both simulated and real data, demonstrating the state-of-the-art results on several vehicle trajectory datasets.
Towards calibrated and scalable uncertainty representations for neural networks
Seedat, Nabeel, Kanan, Christopher
For many applications it is critical to know the uncertainty of a neural network's predictions. While a variety of neural network parameter estimation methods have been proposed for uncertainty estimation, they have not been rigorously compared across uncertainty measures. We assess four of these parameter estimation methods to calibrate uncertainty estimation using four different uncertainty measures: entropy, mutual information, aleatoric uncertainty and epistemic uncertainty. We also evaluate their calibration using expected calibration error. We additionally propose a novel method of neural network parameter estimation called RECAST, which combines cosine annealing with warm restarts with Stochastic Gradient Langevin Dynamics, capturing more diverse parameter distributions. When benchmarked against mutilated data from MNIST, we show that RECAST is well-calibrated and when combined with predictive entropy and epistemic uncertainty it offers the best calibrated measure of uncertainty when compared to recent methods.
16. Appendix: Mathematics for Deep Learning -- Dive into Deep Learning 0.7 documentation
One of the wonderful parts of modern deep learning is the fact that much of it can be understood and used without a full understanding of the mathematics below it. This is a sign of the fact that the field is becoming more mature. Most software developers no longer need to worry about the theory of computable functions, or if programming languages without a goto can emulate programming languages with a goto with at most constant overhead, and neither should the deep learning practitioner need to worry about the theoretical foundations maximum likelihood learning, if one can find an architecture to approximate a target function to an arbitrary degree of accuracy. That said, we are not quite there yet. Sometimes when building a model in practice you will need to understand how architectural choices influence gradient flow, or what assumptions you are making by training with a certain loss function.