Bayesian Learning
Causal Inference through a Witness Protection Program
One of the most fundamental problems in causal inference is the estimation of a causal effect when variables are confounded. This is difficult in an observational study because one has no direct evidence that all confounders have been adjusted for. We introduce a novel approach for estimating causal effects that exploits observational conditional independencies to suggest "weak" paths in a unknown causal graph. The widely used faithfulness condition of Spirtes et al. is relaxed to allow for varying degrees of "path cancellations" that will imply conditional independencies but do not rule out the existence of confounding causal paths. The outcome is a posterior distribution over bounds on the average causal effect via a linear programming approach and Bayesian inference. We claim this approach should be used in regular practice to complement other default tools in observational studies.
Nonparametric Bayesian inference on multivariate exponential families
William R. Vega-Brown, Marek Doniec, Nicholas G. Roy
We develop a model by choosing the maximum entropy distribution from the set of models satisfying certain smoothness and independence criteria; we show that inference on this model generalizes local kernel estimation to the context of Bayesian inference on stochastic processes. Our model enables Bayesian inference in contexts when standard techniques like Gaussian process inference are too expensive to apply. Exact inference on our model is possible for any likelihood function from the exponential family. Inference is then highly efficient, requiring only O (log N) time and O (N) space at run time. We demonstrate our algorithm on several problems and show quantifiable improvement in both speed and performance relative to models based on the Gaussian process.
A framework for studying synaptic plasticity with neural spike train data
Scott Linderman, Christopher H. Stock, Ryan P. Adams
Learning and memory in the brain are implemented by complex, time-varying changes in neural circuitry. The computational rules according to which synaptic weights change over time are the subject of much research, and are not precisely understood. Until recently, limitations in experimental methods have made it challenging to test hypotheses about synaptic plasticity on a large scale. However, as such data become available and these barriers are lifted, it becomes necessary to develop analysis techniques to validate plasticity models. Here, we present a highly extensible framework for modeling arbitrary synaptic plasticity rules on spike train data in populations of interconnected neurons. We treat synaptic weights as a (potentially nonlinear) dynamical system embedded in a fully-Bayesian generalized linear model (GLM). In addition, we provide an algorithm for inferring synaptic weight trajectories alongside the parameters of the GLM and of the learning rules. Using this method, we perform model comparison of two proposed variants of the well-known spike-timing-dependent plasticity (STDP) rule, where nonlinear effects play a substantial role. On synthetic data generated from the biophysical simulator NEURON, we show that we can recover the weight trajectories, the pattern of connectivity, and the underlying learning rules.
Advances in Learning Bayesian Networks of Bounded Treewidth
Siqi Nie, Denis D. Maua, Cassio P. de Campos, Qiang Ji
This work presents novel algorithms for learning Bayesian networks of bounded treewidth. Both exact and approximate methods are developed. The exact method combines mixed integer linear programming formulations for structure learning and treewidth computation. The approximate method consists in sampling k-trees (maximal graphs of treewidth k), and subsequently selecting, exactly or approximately, the best structure whose moral graph is a subgraph of that k-tree. The approaches are empirically compared to each other and to state-of-the-art methods on a collection of public data sets with up to 100 variables.
Temporal Model On Quantum Logic
This paper introduces a unified theoretical framework for modeling temporal memory dynamics, combining concepts from temporal logic, memory decay models, and hierarchical contexts. The framework formalizes the evolution of propositions over time using linear and branching temporal models, incorporating exponential decay (Ebbinghaus forgetting curve) and reactivation mechanisms via Bayesian updating. The hierarchical organization of memory is represented using directed acyclic graphs to model recall dependencies and interference. Novel insights include feedback dynamics, recursive influences in memory chains, and the integration of entropy-based recall efficiency. This approach provides a foundation for understanding memory processes across cognitive and computational domains. Let t R represent a temporal parameter.
Learned Bayesian Cram\'er-Rao Bound for Unknown Measurement Models Using Score Neural Networks
Habi, Hai Victor, Messer, Hagit, Bresler, Yoram
The Bayesian Cram\'er-Rao bound (BCRB) is a crucial tool in signal processing for assessing the fundamental limitations of any estimation problem as well as benchmarking within a Bayesian frameworks. However, the BCRB cannot be computed without full knowledge of the prior and the measurement distributions. In this work, we propose a fully learned Bayesian Cram\'er-Rao bound (LBCRB) that learns both the prior and the measurement distributions. Specifically, we suggest two approaches to obtain the LBCRB: the Posterior Approach and the Measurement-Prior Approach. The Posterior Approach provides a simple method to obtain the LBCRB, whereas the Measurement-Prior Approach enables us to incorporate domain knowledge to improve the sample complexity and {interpretability}. To achieve this, we introduce a Physics-encoded score neural network which enables us to easily incorporate such domain knowledge into a neural network. We {study the learning} errors of the two suggested approaches theoretically, and validate them numerically. We demonstrate the two approaches on several signal processing examples, including a linear measurement problem with unknown mixing and Gaussian noise covariance matrices, frequency estimation, and quantized measurement. In addition, we test our approach on a nonlinear signal processing problem of frequency estimation with real-world underwater ambient noise.
A Planning Framework for Adaptive Labeling
Mittal, Daksh, Ma, Yuanzhe, Joshi, Shalmali, Namkoong, Hongseok
Ground truth labels/outcomes are critical for advancing scientific and engineering applications, e.g., evaluating the treatment effect of an intervention or performance of a predictive model. Since randomly sampling inputs for labeling can be prohibitively expensive, we introduce an adaptive labeling framework where measurement effort can be reallocated in batches. We formulate this problem as a Markov decision process where posterior beliefs evolve over time as batches of labels are collected (state transition), and batches (actions) are chosen to minimize uncertainty at the end of data collection. We design a computational framework that is agnostic to different uncertainty quantification approaches including those based on deep learning, and allows a diverse array of policy gradient approaches by relying on continuous policy parameterizations. On real and synthetic datasets, we demonstrate even a one-step lookahead policy can substantially outperform common adaptive labeling heuristics, highlighting the virtue of planning. On the methodological side, we note that standard REINFORCE-style policy gradient estimators can suffer high variance since they rely only on zeroth order information. We propose a direct backpropagation-based approach, Smoothed-Autodiff, based on a carefully smoothed version of the original non-differentiable MDP. Our method enjoys low variance at the price of introducing bias, and we theoretically and empirically show that this trade-off can be favorable.
Clustering from Labels and Time-Varying Graphs
Shiau Hong Lim, Yudong Chen, Huan Xu
We present a general framework for graph clustering where a label is observed to each pair of nodes. This allows a very rich encoding of various types of pairwise interactions between nodes. We propose a new tractable approach to this problem based on maximum likelihood estimator and convex optimization. We analyze our algorithm under a general generative model, and provide both necessary and sufficient conditions for successful recovery of the underlying clusters. Our theoretical results cover and subsume a wide range of existing graph clustering results including planted partition, weighted clustering and partially observed graphs. Furthermore, the result is applicable to novel settings including time-varying graphs such that new insights can be gained on solving these problems. Our theoretical findings are further supported by empirical results on both synthetic and real data.
Consistent Binary Classification with Generalized Performance Metrics
Oluwasanmi O. Koyejo, Nagarajan Natarajan, Pradeep K. Ravikumar, Inderjit S. Dhillon
Performance metrics for binary classification are designed to capture tradeoffs between four fundamental population quantities: true positives, false positives, true negatives and false negatives. Despite significant interest from theoretical and applied communities, little is known about either optimal classifiers or consistent algorithms for optimizing binary classification performance metrics beyond a few special cases. We consider a fairly large family of performance metrics given by ratios of linear combinations of the four fundamental population quantities. This family includes many well known binary classification metrics such as classification accuracy, AM measure, F-measure and the Jaccard similarity coefficient as special cases. Our analysis identifies the optimal classifiers as the sign of the thresholded conditional probability of the positive class, with a performance metric-dependent threshold.
A Bayesian model for identifying hierarchically organised states in neural population activity
Patrick Putzky, Florian Franzen, Giacomo Bassetto, Jakob H. Macke
Neural population activity in cortical circuits is not solely driven by external inputs, but is also modulated by endogenous states which vary on multiple time-scales. To understand information processing in cortical circuits, we need to understand the statistical structure of internal states and their interaction with sensory inputs. Here, we present a statistical model for extracting hierarchically organised neural population states from multi-channel recordings of neural spiking activity. Population states are modelled using a hidden Markov decision tree with state-dependent tuning parameters and a generalised linear observation model. We present a variational Bayesian inference algorithm for estimating the posterior distribution over parameters from neural population recordings. On simulated data, we show that we can identify the underlying sequence of population states and reconstruct the ground truth parameters. Using population recordings from visual cortex, we find that a model with two levels of population states outperforms both a one-state and a two-state generalised linear model. Finally, we find that modelling of state-dependence also improves the accuracy with which sensory stimuli can be decoded from the population response.