Bayesian Inference
Dynamic neural network with memristive CIM and CAM for 2D and 3D vision
Zhang, Yue, Zhang, Woyu, Wang, Shaocong, Lin, Ning, Yu, Yifei, He, Yangu, Wang, Bo, Jiang, Hao, Lin, Peng, Xu, Xiaoxin, Qi, Xiaojuan, Wang, Zhongrui, Zhang, Xumeng, Shang, Dashan, Liu, Qi, Cheng, Kwang-Ting, Liu, Ming
The brain is dynamic, associative and efficient. It reconfigures by associating the inputs with past experiences, with fused memory and processing. In contrast, AI models are static, unable to associate inputs with past experiences, and run on digital computers with physically separated memory and processing. We propose a hardware-software co-design, a semantic memory-based dynamic neural network (DNN) using memristor. The network associates incoming data with the past experience stored as semantic vectors. The network and the semantic memory are physically implemented on noise-robust ternary memristor-based Computing-In-Memory (CIM) and Content-Addressable Memory (CAM) circuits, respectively. We validate our co-designs, using a 40nm memristor macro, on ResNet and PointNet++ for classifying images and 3D points from the MNIST and ModelNet datasets, which not only achieves accuracy on par with software but also a 48.1% and 15.9% reduction in computational budget. Moreover, it delivers a 77.6% and 93.3% reduction in energy consumption.
Positive and Unlabeled Data: Model, Estimation, Inference, and Classification
Liu, Siyan, Yeh, Chi-Kuang, Zhang, Xin, Tian, Qinglong, Li, Pengfei
This study introduces a new approach to addressing positive and unlabeled (PU) data through the double exponential tilting model (DETM). Traditional methods often fall short because they only apply to selected completely at random (SCAR) PU data, where the labeled positive and unlabeled positive data are assumed to be from the same distribution. In contrast, our DETM's dual structure effectively accommodates the more complex and underexplored selected at random PU data, where the labeled and unlabeled positive data can be from different distributions. We rigorously establish the theoretical foundations of DETM, including identifiability, parameter estimation, and asymptotic properties. Additionally, we move forward to statistical inference by developing a goodness-of-fit test for the SCAR condition and constructing confidence intervals for the proportion of positive instances in the target domain. We leverage an approximated Bayes classifier for classification tasks, demonstrating DETM's robust performance in prediction. Through theoretical insights and practical applications, this study highlights DETM as a comprehensive framework for addressing the challenges of PU data.
Parameter inference from a non-stationary unknown process
Owens, Kieran S., Fulcher, Ben D.
Non-stationary systems are found throughout the world, from climate patterns under the influence of variation in carbon dioxide concentration, to brain dynamics driven by ascending neuromodulation. Accordingly, there is a need for methods to analyze non-stationary processes, and yet most time-series analysis methods that are used in practice, on important problems across science and industry, make the simplifying assumption of stationarity. One important problem in the analysis of non-stationary systems is the problem class that we refer to as Parameter Inference from a Non-stationary Unknown Process (PINUP). Given an observed time series, this involves inferring the parameters that drive non-stationarity of the time series, without requiring knowledge or inference of a mathematical model of the underlying system. Here we review and unify a diverse literature of algorithms for PINUP. We formulate the problem, and categorize the various algorithmic contributions. This synthesis will allow researchers to identify gaps in the literature and will enable systematic comparisons of different methods. We also demonstrate that the most common systems that existing methods are tested on - notably the non-stationary Lorenz process and logistic map - are surprisingly easy to perform well on using simple statistical features like windowed mean and variance, undermining the practice of using good performance on these systems as evidence of algorithmic performance. We then identify more challenging problems that many existing methods perform poorly on and which can be used to drive methodological advances in the field. Our results unify disjoint scientific contributions to analyzing non-stationary systems and suggest new directions for progress on the PINUP problem and the broader study of non-stationary phenomena.
Meta-Analysis with Untrusted Data
Kaul, Shiva, Gordon, Geoffrey J.
[See paper for full abstract] Meta-analysis is a crucial tool for answering scientific questions. It is usually conducted on a relatively small amount of ``trusted'' data -- ideally from randomized, controlled trials -- which allow causal effects to be reliably estimated with minimal assumptions. We show how to answer causal questions much more precisely by making two changes. First, we incorporate untrusted data drawn from large observational databases, related scientific literature and practical experience -- without sacrificing rigor or introducing strong assumptions. Second, we train richer models capable of handling heterogeneous trials, addressing a long-standing challenge in meta-analysis. Our approach is based on conformal prediction, which fundamentally produces rigorous prediction intervals, but doesn't handle indirect observations: in meta-analysis, we observe only noisy effects due to the limited number of participants in each trial. To handle noise, we develop a simple, efficient version of fully-conformal kernel ridge regression, based on a novel condition called idiocentricity. We introduce noise-correcting terms in the residuals and analyze their interaction with a ``variance shaving'' technique. In multiple experiments on healthcare datasets, our algorithms deliver tighter, sounder intervals than traditional ones. This paper charts a new course for meta-analysis and evidence-based medicine, where heterogeneity and untrusted data are embraced for more nuanced and precise predictions.
Inference-Time Rule Eraser: Fair Recognition via Distilling and Removing Biased Rules
Zhang, Yi, Lu, Dongyuan, Sang, Jitao
Machine learning models often make predictions based on biased features such as gender, race, and other social attributes, posing significant fairness risks, especially in societal applications, such as hiring, banking, and criminal justice. Traditional approaches to addressing this issue involve retraining or fine-tuning neural networks with fairness-aware optimization objectives. However, these methods can be impractical due to significant computational resources, complex industrial tests, and the associated CO2 footprint. Additionally, regular users often fail to fine-tune models because they lack access to model parameters In this paper, we introduce the Inference-Time Rule Eraser (Eraser), a novel method designed to address fairness concerns by removing biased decision-making rules from deployed models during inference without altering model weights. We begin by establishing a theoretical foundation for modifying model outputs to eliminate biased rules through Bayesian analysis. Next, we present a specific implementation of Eraser that involves two stages: (1) distilling the biased rules from the deployed model into an additional patch model, and (2) removing these biased rules from the output of the deployed model during inference. Extensive experiments validate the effectiveness of our approach, showcasing its superior performance in addressing fairness concerns in AI systems.
FedLog: Personalized Federated Classification with Less Communication and More Flexibility
Yu, Haolin, Zhang, Guojun, Poupart, Pascal
In federated learning (FL), the common paradigm that FedAvg proposes and most algorithms follow is that clients train local models with their private data, and the model parameters are shared for central aggregation, mostly averaging. In this paradigm, the communication cost is often a challenge, as modern massive neural networks can contain millions to billions parameters. We suggest that clients do not share model parameters but local data summaries, to decrease the cost of sharing. We develop a new algorithm FedLog with Bayesian inference, which shares only sufficient statistics of local data. FedLog transmits messages as small as the last layer of the original model. We conducted comprehensive experiments to show we outperform other FL algorithms that aim at decreasing the communication cost. To provide formal privacy guarantees, we further extend FedLog with differential privacy and show the trade-off between privacy budget and accuracy.
Estimation of spatio-temporal extremes via generative neural networks
Bülte, Christopher, Leimenstoll, Lisa, Schienle, Melanie
As the frequency of extreme weather events rises, it becomes increasingly crucial to understand and detect them at the earliest opportunity. Statistical models provide a way to enhance their interpretability and offer insights into the connections between extreme events. Since geophysical data is often coupled across both space and time this poses challenges for modeling, often leading to highly complex statistical models. For spatial data, such as precipitation, a common way to describe and analyze extremes are max-stable processes, which arise as the unique limit of pointwise maxima of random fields. These processes are an essential tool in analyzing spatial extremes (Davison et al., 2012), as they allow for flexible modeling of the underlying dependence structure. However, when it comes to modeling these extremes, usually only a few observations are available, even less so as the underlying process is usually changing across time. For that reason traditional statistical methods often fail to identify parameters correctly, particularly as these models are high dimensional and complex. Furthermore, estimating parameters becomes especially challenging when dealing with extreme values. Therefore, specifying a distribution rather than relying on point estimators can be beneficial for quantifying uncertainty.
Inflationary Flows: Calibrated Bayesian Inference with Diffusion-Based Models
de Albuquerque, Daniela, Pearson, John
Beyond estimating parameters of interest from data, one of the key goals of statistical inference is to properly quantify uncertainty in these estimates. In Bayesian inference, this uncertainty is provided by the posterior distribution, the computation of which typically involves an intractable high-dimensional integral. Among available approximation methods, sampling-based approaches come with strong theoretical guarantees but scale poorly to large problems, while variational approaches scale well but offer few theoretical guarantees. In particular, variational methods are known to produce overconfident estimates of posterior uncertainty and are typically non-identifiable, with many latent variable configurations generating equivalent predictions. Here, we address these challenges by showing how diffusion-based models (DBMs), which have recently produced state-of-the-art performance in generative modeling tasks, can be repurposed for performing calibrated, identifiable Bayesian inference. By exploiting a previously established connection between the stochastic and probability flow ordinary differential equations (pfODEs) underlying DBMs, we derive a class of models, inflationary flows, that uniquely and deterministically map high-dimensional data to a lower-dimensional Gaussian distribution via ODE integration. This map is both invertible and neighborhood-preserving, with controllable numerical error, with the result that uncertainties in the data are correctly propagated to the latent space. We demonstrate how such maps can be learned via standard DBM training using a novel noise schedule and are effective at both preserving and reducing intrinsic data dimensionality. The result is a class of highly expressive generative models, uniquely defined on a low-dimensional latent space, that afford principled Bayesian inference.
Mitigating Cognitive Biases in Multi-Criteria Crowd Assessment
Despite recent advances in AI and machine learning technologies, many applications still require human assessment because the characteristics of objects that can explain human subjectivity are sometimes unknown or too vague to be extracted automatically, which is a serious bottleneck when conducting large-scale automated quality assessments. The use of crowdsourcing is a promising way to implement this with the wisdom of the crowd. One challenge in crowdsourced quality assessments is the uncertainty of human judgments. Since workers have different competences, expertise, or motivations, their responses are sometimes too noisy to analyze and extract useful knowledge. A straightforward solution is to assign multiple crowdworkers to each evaluation target and aggregate the redundantly collected evaluations using majority voting. More sophisticated statistical methods, such as Bayesian generative models, have also been explored for better aggregations. Various factors of human error have been introduced into statistical models, such as the ability of workers (Dawid & Skene, 1979), difficulty of the questions (Whitehill et al., 2009; Welinder et al., 2011), and presence of malicious workers (Raykar & Yu, 2011).
Why Online Reinforcement Learning is Causal
Schulte, Oliver, Poupart, Pascal
Reinforcement learning (RL) and causal modelling naturally complement each other. The goal of causal modelling is to predict the effects of interventions in an environment, while the goal of reinforcement learning is to select interventions that maximize the rewards the agent receives from the environment. Reinforcement learning includes the two most powerful sources of information for estimating causal relationships: temporal ordering and the ability to act on an environment. This paper examines which reinforcement learning settings we can expect to benefit from causal modelling, and how. In online learning, the agent has the ability to interact directly with their environment, and learn from exploring it. Our main argument is that in online learning, conditional probabilities are causal, and therefore offline RL is the setting where causal learning has the most potential to make a difference. Essentially, the reason is that when an agent learns from their {\em own} experience, there are no unobserved confounders that influence both the agent's own exploratory actions and the rewards they receive. Our paper formalizes this argument. For offline RL, where an agent may and typically does learn from the experience of {\em others}, we describe previous and new methods for leveraging a causal model, including support for counterfactual queries.