Bayesian Inference
Scalable Temporal Anomaly Causality Discovery in Large Systems: Achieving Computational Efficiency with Binary Anomaly Flag Data
Asres, Mulugeta Weldezgina, Omlin, Christian Walter, Collaboration, The CMS-HCAL
Extracting anomaly causality facilitates diagnostics once monitoring systems detect system faults. Identifying anomaly causes in large systems involves investigating a more extensive set of monitoring variables across multiple subsystems. However, learning causal graphs comes with a significant computational burden that restrains the applicability of most existing methods in real-time and large-scale deployments. In addition, modern monitoring applications for large systems often generate large amounts of binary alarm flags, and the distinct characteristics of binary anomaly data -- the meaning of state transition and data sparsity -- challenge existing causality learning mechanisms. This study proposes an anomaly causal discovery approach (AnomalyCD), addressing the accuracy and computational challenges of generating causal graphs from binary flag data sets. The AnomalyCD framework presents several strategies, such as anomaly flag characteristics incorporating causality testing, sparse data and link compression, and edge pruning adjustment approaches. We validate the performance of this framework on two datasets: monitoring sensor data of the readout-box system of the Compact Muon Solenoid experiment at CERN, and a public data set for information technology monitoring. The results demonstrate the considerable reduction of the computation overhead and moderate enhancement of the accuracy of temporal causal discovery on binary anomaly data sets.
Bayesian inference of mean velocity fields and turbulence models from flow MRI
Kontogiannis, A., Nair, P., Loecher, M., Ennis, D. B., Marsden, A., Juniper, M. P.
We solve a Bayesian inverse Reynolds-averaged Navier-Stokes (RANS) problem that assimilates mean flow data by jointly reconstructing the mean flow field and learning its unknown RANS parameters. We devise an algorithm that learns the most likely parameters of an algebraic effective viscosity model, and estimates their uncertainties, from mean flow data of a turbulent flow. We conduct a flow MRI experiment to obtain mean flow data of a confined turbulent jet in an idealized medical device known as the FDA (Food and Drug Administration) nozzle. The algorithm successfully reconstructs the mean flow field and learns the most likely turbulence model parameters without overfitting. The methodology accepts any turbulence model, be it algebraic (explicit) or multi-equation (implicit), as long as the model is differentiable, and naturally extends to unsteady turbulent flows.
Modeling Inter-Intra Heterogeneity for Graph Federated Learning
Yu, Wentao, Chen, Shuo, Tong, Yongxin, Gu, Tianlong, Gong, Chen
Heterogeneity is a fundamental and challenging issue in federated learning, especially for the graph data due to the complex relationships among the graph nodes. To deal with the heterogeneity, lots of existing methods perform the weighted federation based on their calculated similarities between pairwise clients (i.e., subgraphs). However, their inter-subgraph similarities estimated with the outputs of local models are less reliable, because the final outputs of local models may not comprehensively represent the real distribution of subgraph data. In addition, they ignore the critical intra-heterogeneity which usually exists within each subgraph itself. To address these issues, we propose a novel Federated learning method by integrally modeling the Inter-Intra Heterogeneity (FedIIH). For the inter-subgraph relationship, we propose a novel hierarchical variational model to infer the whole distribution of subgraph data in a multi-level form, so that we can accurately characterize the inter-subgraph similarities with the global perspective. For the intra-heterogeneity, we disentangle the subgraph into multiple latent factors and partition the model parameters into multiple parts, where each part corresponds to a single latent factor. Our FedIIH not only properly computes the distribution similarities between subgraphs, but also learns disentangled representations that are robust to irrelevant factors within subgraphs, so that it successfully considers the inter- and intra- heterogeneity simultaneously. Extensive experiments on six homophilic and five heterophilic graph datasets in both non-overlapping and overlapping settings demonstrate the effectiveness of our method when compared with nine state-of-the-art methods. Specifically, FedIIH averagely outperforms the second-best method by a large margin of 5.79% on all heterophilic datasets.
Poisson Multi-Bernoulli Mixtures for Sets of Trajectories
Granstrรถm, Karl, Svensson, Lennart, Xia, Yuxuan, Williams, Jason, Garcรญa-Fernรกndez, รngel F.
The Poisson Multi-Bernoulli Mixture (PMBM) density is a conjugate multi-target density for the standard point target model with Poisson point process birth. This means that both the filtering and predicted densities for the set of targets are PMBM. In this paper, we first show that the PMBM density is also conjugate for sets of trajectories with the standard point target measurement model. Second, based on this theoretical foundation, we develop two trajectory PMBM filters that provide recursions to calculate the posterior density for the set of all trajectories that have ever been present in the surveillance area, and the posterior density of the set of trajectories present at the current time step in the surveillance area. These two filters therefore provide complete probabilistic information on the considered trajectories enabling optimal trajectory estimation. Third, we establish that the density of the set of trajectories in any time window, given the measurements in a possibly different time window, is also a PMBM. Finally, the trajectory PMBM filters are evaluated via simulations, and are shown to yield state-of-the-art performance compared to other multi-target tracking algorithms based on random finite sets and multiple hypothesis tracking.
Enhancing Off-Grid One-Bit DOA Estimation with Learning-Based Sparse Bayesian Approach for Non-Uniform Sparse Array
Hu, Yunqiao, Sun, Shunqiao, Zhang, Yimin D.
This paper tackles the challenge of one-bit off-grid direction of arrival (DOA) estimation in a single snapshot scenario based on a learning-based Bayesian approach. Firstly, we formulate the off-grid DOA estimation model, utilizing the first-order off-grid approximation, incorporating one-bit data quantization. Subsequently, we address this problem using the Sparse Bayesian based framework and solve iteratively. However, traditional Sparse Bayesian methods often face challenges such as high computational complexity and the need for extensive hyperparameter tuning. To balance estimation accuracy and computational efficiency, we propose a novel Learning-based Sparse Bayesian framework, which leverages an unrolled neural network architecture. This framework autonomously learns hyperparameters through supervised learning, offering more accurate off-grid DOA estimates and improved computational efficiency compared to some state-of-the-art methods. Furthermore, the proposed approach is applicable to both uniform linear arrays and non-uniform sparse arrays. Simulation results validate the effectiveness of the proposed framework.
Energy-Efficient Sampling Using Stochastic Magnetic Tunnel Junctions
Alder, Nicolas, Kajale, Shivam Nitin, Tunsiricharoengul, Milin, Sarkar, Deblina, Herbrich, Ralf
We introduce an energy-efficient algorithm for uniform Float16 sampling, utilizing a roomtemperature stochastic magnetic tunnel junction device to generate truly random floating-point numbers. By avoiding expensive symbolic computation and mapping physical phenomena directly to the statistical properties of the floating-point format and uniform distribution, our approach achieves a higher level of energy efficiency than the state-of-the-art Mersenne-Twister algorithm by a minimum factor of 9721 and an improvement factor of 5649 compared to the more energy-efficient PCG algorithm. Building on this sampling technique and hardware framework, we decompose arbitrary distributions into many non-overlapping approximative uniform distributions along with convolution and prior-likelihood operations, which allows us to sample from any 1D distribution without closed-form solutions. We provide measurements of the potential accumulated approximation errors, demonstrating the effectiveness of our method. This not only increases the cost of products, but also presents obstacles in addressing climate change. Traditional AI methods like deep learning lack the ability to quantify uncertainties, which is crucial to address issues such as hallucinations or ensuring safety in critical tasks. Probabilistic machine learning, while providing a theoretical framework for achieving muchneeded uncertainty quantification, also suffers from high energy consumption and is unviable on a truly large scale due to insufficient computational resources (Izmailov et al., 2021). At the heart of probabilistic machine learning and Bayesian inference is Markov Chain Monte Carlo (MCMC) sampling (Kass et al., 1998; Murphy, 2012; Hoffman & Gelman, 2014). Although effective in generating samples from complex distributions, MCMC is known for its substantial computational and energy requirements, making it unsuitable for large-scale deployment for applications such as Bayesian neural networks (Izmailov et al., 2021). In general, random number generation is an expensive task that is required in many machine learning algorithms. To address these challenges, this paper proposes a novel hardware framework aimed at improving energy efficiency, in particular tailored for probabilistic machine learning methods. Our framework builds on uniform floating-point format sampling utilizing stochastically switching magnetic tunnel junction (s-MTJ) devices as a foundation, achieving significant gains in both computational resources and energy consumption compared to current pseudorandom number generators. In contrast to existing generators, this device-focused strategy not only enhances sampling efficiency but also incorporates genuine randomness originating from the thermal noise in our devices.
Evidential time-to-event prediction with calibrated uncertainty quantification
Huang, Ling, Xing, Yucheng, Mishra, Swapnil, Denoeux, Thierry, Feng, Mengling
Time-to-event analysis provides insights into clinical prognosis and treatment recommendations. However, this task is more challenging than standard regression problems due to the presence of censored observations. Additionally, the lack of confidence assessment, model robustness, and prediction calibration raises concerns about the reliability of predictions. To address these challenges, we propose an evidential regression model specifically designed for time-to-event prediction. The proposed model quantifies both epistemic and aleatory uncertainties using Gaussian Random Fuzzy Numbers and belief functions, providing clinicians with uncertainty-aware survival time predictions. The model is trained by minimizing a generalized negative log-likelihood function accounting for data censoring. Experimental evaluations using simulated datasets with different data distributions and censoring conditions, as well as real-world datasets across diverse clinical applications, demonstrate that our model delivers both accurate and reliable performance, outperforming state-of-the-art methods. These results highlight the potential of our approach for enhancing clinical decision-making in survival analysis.
Speeding up approximate MAP by applying domain knowledge about relevant variables
Kwisthout, Johan, Schroeder, Andrew
The MAP problem in Bayesian networks is notoriously intractable, even when approximated. In an earlier paper we introduced the Most Frugal Explanation heuristic approach to solving MAP, by partitioning the set of intermediate variables (neither observed nor part of the MAP variables) into a set of relevant variables, which are marginalized out, and irrelevant variables, which will be assigned a sampled value from their domain. In this study we explore whether knowledge about which variables are relevant for a particular query (i.e., domain knowledge) speeds up computation sufficiently to beat both exact MAP as well as approximate MAP while giving reasonably accurate results. Our results are inconclusive, but also show that this probably depends on the specifics of the MAP query, most prominently the number of MAP variables.
Stochastic Learning of Non-Conjugate Variational Posterior for Image Classification
Large scale Bayesian nonparametrics (BNP) learner such as stochastic variational inference (SVI) can handle datasets with large class number and large training size at fractional cost. Like its predecessor, SVI rely on the assumption of conjugate variational posterior to approximate the true posterior. A more challenging problem is to consider large scale learning on non-conjugate posterior. Recent works in this direction are mostly associated with using Monte Carlo methods for approximating the learner. However, these works are usually demonstrated on non-BNP related task and less complex models such as logistic regression, due to higher computational complexity. In order to overcome the issue faced by SVI, we develop a novel approach based on the recently proposed variational maximization-maximization (VMM) learner to allow large scale learning on non-conjugate posterior. Unlike SVI, our VMM learner does not require closed-form expression for the variational posterior expectatations. Our only requirement is that the variational posterior is differentiable. In order to ensure convergence in stochastic settings, SVI rely on decaying step-sizes to slow its learning. Inspired by SVI and Adam, we propose the novel use of decaying step-sizes on both gradient and ascent direction in our VMM to significantly improve its learning. We show that our proposed methods is compatible with ResNet features when applied to large class number datasets such as MIT67 and SUN397. Finally, we compare our proposed learner with several recent works such as deep clustering algorithms and showed we were able to produce on par or outperform the state-of-the-art methods in terms of clustering measures.
Improving Active Learning with a Bayesian Representation of Epistemic Uncertainty
Thomas, Jake, Houssineau, Jeremie
A popular strategy for active learning is to specifically target a reduction in epistemic uncertainty, since aleatoric uncertainty is often considered as being intrinsic to the system of interest and therefore not reducible. Yet, distinguishing these two types of uncertainty remains challenging and there is no single strategy that consistently outperforms the others. We propose to use a particular combination of probability and possibility theories, with the aim of using the latter to specifically represent epistemic uncertainty, and we show how this combination leads to new active learning strategies that have desirable properties. In order to demonstrate the efficiency of these strategies in non-trivial settings, we introduce the notion of a possibilistic Gaussian process (GP) and consider GP-based multiclass and binary classification problems, for which the proposed methods display a strong performance for both simulated and real datasets.