Bayesian Inference
Multi-Output Gaussian Processes for Crowdsourced Traffic Data Imputation
Rodrigues, Filipe, Henrickson, Kristian, Pereira, Francisco C.
Traffic speed data imputation is a fundamental challenge for data-driven transport analysis. In recent years, with the ubiquity of GPS-enabled devices and the widespread use of crowdsourcing alternatives for the collection of traffic data, transportation professionals increasingly look to such user-generated data for many analysis, planning, and decision support applications. However, due to the mechanics of the data collection process, crowdsourced traffic data such as probe-vehicle data is highly prone to missing observations, making accurate imputation crucial for the success of any application that makes use of that type of data. In this article, we propose the use of multi-output Gaussian processes (GPs) to model the complex spatial and temporal patterns in crowdsourced traffic data. While the Bayesian nonparametric formalism of GPs allows us to model observation uncertainty, the multi-output extension based on convolution processes effectively enables us to capture complex spatial dependencies between nearby road segments. Using 6 months of crowdsourced traffic speed data or "probe vehicle data" for several locations in Copenhagen, the proposed approach is empirically shown to significantly outperform popular state-of-the-art imputation methods.
Heteroscedastic Gaussian processes for uncertainty modeling in large-scale crowdsourced traffic data
Rodrigues, Filipe, Pereira, Francisco C.
Accurately modeling traffic speeds is a fundamental part of efficient intelligent transportation systems. Nowadays, with the widespread deployment of GPSenabled devices, it has become possible to crowdsource the collection of speed information to road users (e.g. through mobile applications or dedicated in-vehicle devices). Despite its rather wide spatial coverage, crowdsourced speed data also brings very important challenges, such as the highly variable measurement noise in the data due to a variety of driving behaviors and sample sizes. When not properly accounted for, this noise can severely compromise any application that relies on accurate traffic data. In this article, we propose the use of heteroscedastic Gaussian processes (HGP) to model the time-varying uncertainty in large-scale crowdsourced traffic data. Furthermore, we develop a HGP conditioned on sample size and traffic regime (SRC-HGP), which makes use of sample size information (probe vehicles per minute) as well as previous observed speeds, in order to more accurately model the uncertainty in observed speeds. Using 6 months of crowdsourced traffic data from Copenhagen, we empirically show that the proposed heteroscedastic models produce significantly better predictive distributions when compared to current state-of-the-art methods for both speed imputation and short-term forecasting tasks. Keywords: Gaussian processes, heteroscedastic models, traffic data, crowdsourcing, uncertainty modeling, forecasting, imputation, floating car data 1. Introduction Modeling traffic speeds is an essential task for developing intelligent transportation systems, because it provides real-time and anticipatory information about the performance of the network. This information is not only essential for traffic managers, since it allows them to properly allocate resources (e.g. The role of accurate traffic speed modeling is even more significant when we consider innovative car-sharing, autonomous vehicles and connected vehicles technologies (Tajalli & Hajbabaie, 2018), where inappropriate routing of vehicles and poor system-wide optimization and coordination can have severe adverse effects in the behavior of the road network (e.g., congestion and poor quality of service) and, ultimately, it can be decisive to the adoption of these technologies. There are two main sources of traffic speed data: static traffic sensors located at fixed location and GPS sensors from floating vehicles.
Bayesian parameter estimation of miss-specified models
Oberpriller, Johannes, Enรlin, T. A.
Fitting a simplifying model with several parameters to real data of complex objects is a highly nontrivial task, but enables the possibility to get insights into the objects physics. Here, we present a method to infer the parameters of the model, the model error as well as the statistics of the model error. This method relies on the usage of many data sets in a simultaneous analysis in order to overcome the problems caused by the degeneracy between model parameters and model error. Errors in the modeling of the measurement instrument can be absorbed in the model error allowing for applications with complex instruments.
Entropy-Constrained Training of Deep Neural Networks
Wiedemann, Simon, Marban, Arturo, Mรผller, Klaus-Robert, Samek, Wojciech
Abstract--We propose a general framework for neural network compression that is motivated by the Minimum Description Length (MDL) principle. For that we first derive an expression forthe entropy of a neural network, which measures its complexity explicitly in terms of its bit-size. This objective generalizes many of the compression techniques proposed in the literature, in that pruning or reducing the cardinality of the weight elements of the network can be seen special cases of entropy-minimization techniques. Furthermore, we derive a continuous relaxation of the objective, which allows us to minimize it using gradient based optimization techniques. Finally, we show that we can reach stateof-the-art compressionresults on different network architectures and data sets, e.g. I. INTRODUCTION It is well established that deep neural networks excel on a wide range of machine learning tasks [1].
On The Chain Rule Optimal Transport Distance
We define a novel class of distances between statistical multivariate distributions by solving an optimal transportation problem on their marginal densities with respect to a ground distance defined on their conditional densities. By using the chain rule factorization of probabilities, we show how to perform optimal transport on a ground space being an information-geometric manifold of conditional probabilities. We prove that this new distance is a metric whenever the chosen ground distance is a metric. Our distance generalizes both the Wasserstein distances between point sets and a recently introduced metric distance between statistical mixtures. As a first application of this Chain Rule Optimal Transport (CROT) distance, we show that the ground distance between statistical mixtures is upper bounded by this optimal transport distance, whenever the ground distance is joint convex. We report on our experiments which quantify the tightness of the CROT distance for the total variation distance and a square root generalization of the Jensen-Shannon divergence between mixtures.
Disentangling group and link persistence in Dynamic Stochastic Block models
Barucca, Paolo, Lillo, Fabrizio, Mazzarisi, Piero, Tantari, Daniele
We study the inference of a model of dynamic networks in which both communities and links keep memory of previous network states. By considering maximum likelihood inference from single snapshot observations of the network, we show that link persistence makes the inference of communities harder, decreasing the detectability threshold, while community persistence tends to make it easier. We analytically show that communities inferred from single network snapshot can share a maximum overlap with the underlying communities of a specific previous instant in time. This leads to time-lagged inference: the identification of past communities rather than present ones. Finally we compute the time lag and propose a corrected algorithm, the Lagged Snapshot Dynamic (LSD) algorithm, for community detection in dynamic networks. We analytically and numerically characterize the detectability transitions of such algorithm as a function of the memory parameters of the model and we make a comparison with a full dynamic inference.
A geometric characterisation of sensitivity analysis in monomial models
Leonelli, Manuele, Riccomagno, Eva
Sensitivity analysis in probabilistic discrete graphical models is usually conducted by varying one probability value at a time and observing how this affects output probabilities of interest. When one probability is varied then others are proportionally covaried to respect the sum-to-one condition of probability laws. The choice of proportional covariation is justified by a variety of optimality conditions, under which the original and the varied distributions are as close as possible under different measures of closeness. For variations of more than one parameter at a time proportional covariation is justified in some special cases only. In this work, for the large class of discrete statistical models entertaining a regular monomial parametrisation, we demonstrate the optimality of newly defined proportional multi-way schemes with respect to an optimality criterion based on the notion of I-divergence. We demonstrate that there are varying parameters choices for which proportional covariation is not optimal and identify the sub-family of model distributions where the distance between the original distribution and the one where probabilities are covaried proportionally is minimum. This is shown by adopting a new formal, geometric characterization of sensitivity analysis in monomial models, which include a wide array of probabilistic graphical models. We also demonstrate the optimality of proportional covariation for multi-way analyses in Naive Bayes classifiers.
Machine Learning for Molecular Dynamics on Long Timescales
Molecular Dynamics (MD) simulation is widely used to analyze the properties of molecules and materials. Most practical applications, such as comparison with experimental measurements, designing drug molecules, or optimizing materials, rely on statistical quantities, which may be prohibitively expensive to compute from direct long-time MD simulations. Classical Machine Learning (ML) techniques have already had a profound impact on the field, especially for learning low-dimensional models of the long-time dynamics and for devising more efficient sampling schemes for computing long-time statistics. Novel ML methods have the potential to revolutionize long-timescale MD and to obtain interpretable models. ML concepts such as statistical estimator theory, end-to-end learning, representation learning and active learning are highly interesting for the MD researcher and will help to develop new solutions to hard MD problems. With the aim of better connecting the MD and ML research areas and spawning new research on this interface, we define the learning problems in long-timescale MD, present successful approaches and outline some of the unsolved ML problems in this application field.
Adams Conditioning and Likelihood Ratio Transfer Mediated Inference
Bayesian inference as applied in a legal setting is about belief transfer and involves a plurality of agents and communication protocols. A forensic expert (FE) may communicate to a trier of fact (TOF) first its value of a certain likelihood ratio with respect to FE's belief state as represented by a probability function on FE's proposition space. Subsequently FE communicates its recently acquired confirmation that a certain evidence proposition is true. Then TOF performs likelihood ratio transfer mediated reasoning thereby revising their own belief state. The logical principles involved in likelihood transfer mediated reasoning are discussed in a setting where probabilistic arithmetic is done within a meadow, and with Adams conditioning placed in a central role.
Online gradient-based mixtures for transfer modulation in meta-learning
Jerfel, Ghassen, Grant, Erin, Griffiths, Thomas L., Heller, Katherine
Learning-to-learn or meta-learning leverages data-driven inductive bias to increase the efficiency of learning on a novel task. This approach encounters difficulty when transfer is not mutually beneficial, for instance, when tasks are sufficiently dissimilar or change over time. Here, we use the connection between gradient-based meta-learning and hierarchical Bayes (Grant et al., 2018) to propose a mixture of hierarchical Bayesian models over the parameters of an arbitrary function approximator such as a neural network. Generalizing the model-agnostic meta-learning (MAML) algorithm (Finn et al., 2017), we present a stochastic expectation maximization procedure to jointly estimate parameter initializations for gradient descent as well as a latent assignment of tasks to initializations. This approach better captures the diversity of training tasks as opposed to consolidating inductive biases into a single set of hyperparameters. Our experiments demonstrate better generalization performance on the standard miniImageNet benchmark for 1-shot classification. We further derive a novel and scalable non-parametric variant of our method that captures the evolution of a task distribution over time as demonstrated on a set of few-shot regression tasks.