Goto

Collaborating Authors

 Bayesian Inference


Bayesian Few-Shot Classification with One-vs-Each P\'olya-Gamma Augmented Gaussian Processes

arXiv.org Machine Learning

Few-shot classification (FSC), the task of adapting a classifier to unseen classes given a small labeled dataset, is an important step on the path toward human-like machine learning. Bayesian methods are well-suited to tackling the fundamental issue of overfitting in the few-shot scenario because they allow practitioners to specify prior beliefs and update those beliefs in light of observed data. Contemporary approaches to Bayesian few-shot classification maintain a posterior distribution over model parameters, which is slow and requires storage that scales with model size. Instead, we propose a Gaussian process classifier based on a novel combination of P\'olya-gamma augmentation and the one-vs-each softmax approximation that allows us to efficiently marginalize over functions rather than model parameters. We demonstrate improved accuracy and uncertainty quantification on both standard few-shot classification benchmarks and few-shot domain transfer tasks.


Bayes Theorem

#artificialintelligence

Both frequentist and Bayesian probability have a role to play in machine learning. For example, if dealing with truly random and discrete variables, such as landing a six in a die roll, the traditional approach of simply calculating the odds (frequency) is the fastest way to model a likely outcome. However, if the six keeps coming up far more often than the predicated 1/6 odds, only Bayesian probability would take that new observation into account and increase the confidence level that someone is playing with loaded dice.


Modeling Stochastic Microscopic Traffic Behaviors: a Physics Regularized Gaussian Process Approach

arXiv.org Machine Learning

Modeling stochastic traffic behaviors at the microscopic level, such as car-following and lane-changing, is a crucial task to understand the interactions between individual vehicles in traffic streams. Leveraging a recently developed theory named physics regularized Gaussian process (PRGP), this study presents a stochastic microscopic traffic model that can capture the randomness and measure errors in the real world. Physical knowledge from classical car-following models is converted as physics regularizers, in the form of shadow Gaussian process (GP), of a multivariate PRGP for improving the modeling accuracy. More specifically, a Bayesian inference algorithm is developed to estimate the mean and kernel of GPs, and an enhanced latent force model is formulated to encode physical knowledge into stochastic processes. Also, based on the posterior regularization inference framework, an efficient stochastic optimization algorithm is developed to maximize the evidence lower-bound of the system likelihood. To evaluate the performance of the proposed models, this study conducts empirical studies on real-world vehicle trajectories from the NGSIM dataset. Since one unique feature of the proposed framework is the capability of capturing both car-following and lane-changing behaviors with one single model, numerical tests are carried out with two separated datasets, one contains lane-changing maneuvers and the other doesn't. The results show the proposed method outperforms the previous influential methods in estimation precision.


Inferring Signaling Pathways with Probabilistic Programming

arXiv.org Machine Learning

Cells regulate themselves via dizzyingly complex biochemical processes called signaling pathways. These are usually depicted as a network, where nodes represent proteins and edges indicate their influence on each other. In order to understand diseases and therapies at the cellular level, it is crucial to have an accurate understanding of the signaling pathways at work. Since signaling pathways can be modified by disease, the ability to infer signaling pathways from condition- or patient-specific data is highly valuable. A variety of techniques exist for inferring signaling pathways. We build on past works that formulate signaling pathway inference as a Dynamic Bayesian Network structure estimation problem on phosphoproteomic time course data. We take a Bayesian approach, using Markov Chain Monte Carlo to estimate a posterior distribution over possible Dynamic Bayesian Network structures. Our primary contributions are (i) a novel proposal distribution that efficiently samples sparse graphs and (ii) the relaxation of common restrictive modeling assumptions. We implement our method, named Sparse Signaling Pathway Sampling, in Julia using the Gen probabilistic programming language. Probabilistic programming is a powerful methodology for building statistical models. The resulting code is modular, extensible, and legible. The Gen language, in particular, allows us to customize our inference procedure for biological graphs and ensure efficient sampling. We evaluate our algorithm on simulated data and the HPN-DREAM pathway reconstruction challenge, comparing our performance against a variety of baseline methods. Our results demonstrate the vast potential for probabilistic programming, and Gen specifically, for biological network inference. Find the full codebase at https://github.com/gitter-lab/ssps


The role of collider bias in understanding statistics on racially biased policing

arXiv.org Artificial Intelligence

Even before the recent George Floyd case, there has been much debate about the extent to which claims of systemic racism are supported by statistical evidence. For example (Ross 2015) claims that unarmed blacks are 3.5 times more likely to be shot by police than unarmed whites when adjusting for relative differences in population size. However, (Fryer 2016) - formally published later as (Fryer 2019) - found that there was no such racial disparity when the data were conditioned on people being stopped by police, and there was a similar conclusion in (Patty and Hanson 2020) that was produced in direct response to public concerns about the Floyd case. In response to Fryer, (Ross, Winterhalder, and McElreath 2018) argued that Fryer's analysis was compromised because it was essentially an example of Simpson's paradox (Simpson 1951; Bickel, Hammel, and O'Connell 1975; Fenton, Neil, and Constantinou 2019) whereby conclusions based on pooled statistics are reversed when drilling down into relevant subcategories. A new paper (Knox, Lowe, and Mummolo 2020) explains why Simpson's paradox is not the only statistical explanation for the apparently contradictory conclusions of Ross and Fryer.


Extended Stochastic Block Models

arXiv.org Machine Learning

Stochastic block models (SBM) are widely used in network science due to their interpretable structure that allows inference on groups of nodes having common connectivity patterns. Although providing a well established model-based approach for community detection, such formulations are still the object of intense research to address the key problem of inferring the unknown number of communities. This has motivated the development of several probabilistic mechanisms to characterize the node partition process, covering solutions with fixed, random and infinite number of communities. In this article we provide a unified view of all these formulations within a single extended stochastic block model (ESBM), that relies on Gibbs-type processes and encompasses most existing representations as special cases. Connections with Bayesian nonparametric literature open up new avenues that allow the natural inclusion of several unexplored options to model the nodes partition process and to incorporate node attributes in a principled manner. Among these new alternatives, we focus on the Gnedin process as an example of a probabilistic mechanism with desirable theoretical properties and nice empirical performance. A collapsed Gibbs sampler that can be applied to the whole ESBM class is proposed, and refined methods for estimation, uncertainty quantification and model assessment are outlined. The performance of ESBM is assessed in simulations and an application to bill co-sponsorship networks in the Italian parliament, where we find key hidden block structures and core-periphery patterns.


Incremental Bayesian tensor learning for structural monitoring data imputation and response forecasting

arXiv.org Machine Learning

There has been increased interest in missing sensor data imputation, which is ubiquitous in the field of structural health monitoring (SHM) due to discontinuous sensing caused by sensor malfunction. To address this fundamental issue, this paper presents an incremental Bayesian tensor learning method for reconstruction of spatiotemporal missing data in SHM and forecasting of structural response. In particular, a spatiotemporal tensor is first constructed followed by Bayesian tensor factorization that extracts latent features for missing data imputation. To enable structural response forecasting based on incomplete sensing data, the tensor decomposition is further integrated with vector autoregression in an incremental learning scheme. The performance of the proposed approach is validated on continuous field-sensing data (including strain and temperature records) of a concrete bridge, based on the assumption that strain time histories are highly correlated to temperature recordings. The results indicate that the proposed probabilistic tensor learning approach is accurate and robust even in the presence of large rates of random missing, structured missing and their combination. The effect of rank selection on the imputation and prediction performance is also investigated. The results show that a better estimation accuracy can be achieved with a higher rank for random missing whereas a lower rank for structured missing.


Faster Uncertainty Quantification for Inverse Problems with Conditional Normalizing Flows

arXiv.org Machine Learning

In inverse problems, we often have access to data consisting of paired samples $(x,y)\sim p_{X,Y}(x,y)$ where $y$ are partial observations of a physical system, and $x$ represents the unknowns of the problem. Under these circumstances, we can employ supervised training to learn a solution $x$ and its uncertainty from the observations $y$. We refer to this problem as the "supervised" case. However, the data $y\sim p_{Y}(y)$ collected at one point could be distributed differently than observations $y'\sim p_{Y}'(y')$, relevant for a current set of problems. In the context of Bayesian inference, we propose a two-step scheme, which makes use of normalizing flows and joint data to train a conditional generator $q_{\theta}(x|y)$ to approximate the target posterior density $p_{X|Y}(x|y)$. Additionally, this preliminary phase provides a density function $q_{\theta}(x|y)$, which can be recast as a prior for the "unsupervised" problem, e.g.~when only the observations $y'\sim p_{Y}'(y')$, a likelihood model $y'|x$, and a prior on $x'$ are known. We then train another invertible generator with output density $q'_{\phi}(x|y')$ specifically for $y'$, allowing us to sample from the posterior $p_{X|Y}'(x|y')$. We present some synthetic results that demonstrate considerable training speedup when reusing the pretrained network $q_{\theta}(x|y')$ as a warm start or preconditioning for approximating $p_{X|Y}'(x|y')$, instead of learning from scratch. This training modality can be interpreted as an instance of transfer learning. This result is particularly relevant for large-scale inverse problems that employ expensive numerical simulations.


Quantifying and Reducing Bias in Maximum Likelihood Estimation of Structured Anomalies

arXiv.org Machine Learning

Anomaly estimation, or the problem of finding a subset of a dataset that differs from the rest of the dataset, is a classic problem in machine learning and data mining. In both theoretical work and in applications, the anomaly is assumed to have a specific structure defined by membership in an $\textit{anomaly family}$. For example, in temporal data the anomaly family may be time intervals, while in network data the anomaly family may be connected subgraphs. The most prominent approach for anomaly estimation is to compute the Maximum Likelihood Estimator (MLE) of the anomaly. However, it was recently observed that for some anomaly families, the MLE is an asymptotically $\textit{biased}$ estimator of the anomaly. Here, we demonstrate that the bias of the MLE depends on the size of the anomaly family. We prove that if the number of sets in the anomaly family that contain the anomaly is sub-exponential, then the MLE is asymptotically unbiased. At the same time, we provide empirical evidence that the converse is also true: if the number of such sets is exponential, then the MLE is asymptotically biased. Our analysis unifies a number of earlier results on the bias of the MLE for specific anomaly families, including intervals, submatrices, and connected subgraphs. Next, we derive a new anomaly estimator using a mixture model, and we empirically demonstrate that our estimator is asymptotically unbiased regardless of the size of the anomaly family. We illustrate the benefits of our estimator on both simulated disease outbreak data and a real-world highway traffic dataset.


Online Approximate Bayesian learning

arXiv.org Machine Learning

We introduce in this work a new method for online approximate Bayesian learning, whose main idea is to approximate the sequence $(\pi_t)_{t\geq 1}$ of posterior distributions by a sequence $(\tilde{\pi}_t)_{t\geq 1}$ which (i) can be estimated in an online fashion using sequential Monte Carlo methods and (ii) is shown to converge to the same distribution as the sequence $(\pi_t)_{t\geq 1}$, under weak assumptions on the statistical model at hand. In its simplest version, the proposed approach amounts to take for $(\tilde{\pi}_t)_{t\geq 1}$ the sequence of filtering distributions associated to a particular state-space model, and to approximate this sequence using a standard particle filter algorithm. We illustrate on several challenging examples the benefits of this procedure for online approximate Bayesian parameter inference, and with one real data example we show that its online predictive performance can significantly outperform that of stochastic gradient descent and of streaming variational Bayes.