Bayesian Learning
A quasi-Bayesian sequential approach to deconvolution density estimation
Favaro, Stefano, Fortini, Sandra
Density deconvolution addresses the estimation of the unknown (probability) density function $f$ of a random signal from data that are observed with an independent additive random noise. This is a classical problem in statistics, for which frequentist and Bayesian nonparametric approaches are available to deal with static or batch data. In this paper, we consider the problem of density deconvolution in a streaming or online setting where noisy data arrive progressively, with no predetermined sample size, and we develop a sequential nonparametric approach to estimate $f$. By relying on a quasi-Bayesian sequential approach, often referred to as Newton's algorithm, we obtain estimates of $f$ that are of easy evaluation, computationally efficient, and with a computational cost that remains constant as the amount of data increases, which is critical in the streaming setting. Large sample asymptotic properties of the proposed estimates are studied, yielding provable guarantees with respect to the estimation of $f$ at a point (local) and on an interval (uniform). In particular, we establish local and uniform central limit theorems, providing corresponding asymptotic credible intervals and bands. We validate empirically our methods on synthetic and real data, by considering the common setting of Laplace and Gaussian noise distributions, and make a comparison with respect to the kernel-based approach and a Bayesian nonparametric approach with a Dirichlet process mixture prior.
Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience
He, Zhonghao, Achterberg, Jascha, Collins, Katie, Nejad, Kevin, Akarca, Danyal, Yang, Yinzhu, Gurnee, Wes, Sucholutsky, Ilia, Tang, Yuhan, Ianov, Rebeca, Ogden, George, Li, Chole, Sandbrink, Kai, Casper, Stephen, Ivanova, Anna, Lindsay, Grace W.
Interpretability research aims to provide a human-understandable explanation for model outputs and behaviors based on the input and model's internal structure [Doshi-Velez and Kim, 2017]. The field's goal is to generate mechanistic explanations of how neural networks perform computations and produce behaviors [Nanda et al., 2023, Olsson et al., 2022], which could help predict the behavior of such networks across a wide range of scenarios and possibly solve notable problems of AI systems, such as hallucination and toxic output [Ji et al., 2023]. Being able to interpret AI systems is therefore a key capability to be able to understand whether models are appropriately fair, reliable, robust, and worthy of user trust [Doshi-Velez and Kim, 2017]. However, understanding the computations of frontier AI systems with hundreds of billions of parameters presents many technical challenges, from the curse of dimensionality [Zhao et al., 2024, Altman and Krzywinski, 2018] to finding a suitable unit of analysis [Olah et al., 2020, Zou et al., 2023]. These challenges are par for the course when studying complex systems. In particular, many challenges around artificial neural networks (ANN) interpretability are intimately familiar to another group of researchers: neuroscientists. Neuroscience (often in partnership with cognitive science and psychology) investigates how neurons, their connections, and their activity patterns give rise to cognition and behavior. Similar to how deep learning researchers have recognized, neuroscientists have realized that simply examining activity profiles of individual neurons in response to a particular input is often insufficient for understanding how the system performs computation. Instead, complex neural systems are best understood across multiple levels of analysis - considering behavior alongside the brain's connectome, population codes, and codes of single neurons to gain a holistic understanding of the inner workings of the brain
Bayesian Low-Rank LeArning (Bella): A Practical Approach to Bayesian Neural Networks
Doan, Bao Gia, Shamsi, Afshar, Guo, Xiao-Yu, Mohammadi, Arash, Alinejad-Rokny, Hamid, Sejdinovic, Dino, Ranasinghe, Damith C., Abbasnejad, Ehsan
Computational complexity of Bayesian learning is impeding its adoption in practical, large-scale tasks. Despite demonstrations of significant merits such as improved robustness and resilience to unseen or out-of-distribution inputs over their non- Bayesian counterparts, their practical use has faded to near insignificance. In this study, we introduce an innovative framework to mitigate the computational burden of Bayesian neural networks (BNNs). Our approach follows the principle of Bayesian techniques based on deep ensembles, but significantly reduces their cost via multiple low-rank perturbations of parameters arising from a pre-trained neural network. Both vanilla version of ensembles as well as more sophisticated schemes such as Bayesian learning with Stein Variational Gradient Descent (SVGD), previously deemed impractical for large models, can be seamlessly implemented within the proposed framework, called Bayesian Low-Rank LeArning (Bella). In a nutshell, i) Bella achieves a dramatic reduction in the number of trainable parameters required to approximate a Bayesian posterior; and ii) it not only maintains, but in some instances, surpasses the performance of conventional Bayesian learning methods and non-Bayesian baselines. Our results with large-scale tasks such as ImageNet, CAMELYON17, DomainNet, VQA with CLIP, LLaVA demonstrate the effectiveness and versatility of Bella in building highly scalable and practical Bayesian deep models for real-world applications.
Understanding Uncertainty-based Active Learning Under Model Mismatch
Rahmati, Amir Hossein, Fan, Mingzhou, Zhou, Ruida, Urban, Nathan M., Yoon, Byung-Jun, Qian, Xiaoning
Instead of randomly acquiring training data points, Uncertainty-based Active Learning (UAL) operates by querying the label(s) of pivotal samples from an unlabeled pool selected based on the prediction uncertainty, thereby aiming at minimizing the labeling cost for model training. The efficacy of UAL critically depends on the model capacity as well as the adopted uncertainty-based acquisition function. Within the context of this study, our analytical focus is directed toward comprehending how the capacity of the machine learning model may affect UAL efficacy. Through theoretical analysis, comprehensive simulations, and empirical studies, we conclusively demonstrate that UAL can lead to worse performance in comparison with random sampling when the machine learning model class has low capacity and is unable to cover the underlying ground truth. In such situations, adopting acquisition functions that directly target estimating the prediction performance may be beneficial for improving the performance of UAL.
Obfuscated Memory Malware Detection
P, Sharmila S, Tiwari, Aruna, Chaudhari, Narendra S
Providing security for information is highly critical in the current era with devices enabled with smart technology, where assuming a day without the internet is highly impossible. Fast internet at a cheaper price, not only made communication easy for legitimate users but also for cybercriminals to induce attacks in various dimensions to breach privacy and security. Cybercriminals gain illegal access and breach the privacy of users to harm them in multiple ways. Malware is one such tool used by hackers to execute their malicious intent. Development in AI technology is utilized by malware developers to cause social harm. In this work, we intend to show how Artificial Intelligence and Machine learning can be used to detect and mitigate these cyber-attacks induced by malware in specific obfuscated malware. We conducted experiments with memory feature engineering on memory analysis of malware samples. Binary classification can identify whether a given sample is malware or not, but identifying the type of malware will only guide what next step to be taken for that malware, to stop it from proceeding with its further action. Hence, we propose a multi-class classification model to detect the three types of obfuscated malware with an accuracy of 89.07% using the Classic Random Forest algorithm. To the best of our knowledge, there is very little amount of work done in classifying multiple obfuscated malware by a single model. We also compared our model with a few state-of-the-art models and found it comparatively better.
Accelerated Markov Chain Monte Carlo Using Adaptive Weighting Scheme
Wang, Yanbo, Chen, Wenyu, Shan, Shimin
Gibbs sampling is one of the most commonly used Markov Chain Monte Carlo (MCMC) algorithms due to its simplicity and efficiency. It cycles through the latent variables, sampling each one from its distribution conditional on the current values of all the other variables. Conventional Gibbs sampling is based on the systematic scan (with a deterministic order of variables). In contrast, in recent years, Gibbs sampling with random scan has shown its advantage in some scenarios. However, almost all the analyses of Gibbs sampling with the random scan are based on uniform selection of variables. In this paper, we focus on a random scan Gibbs sampling method that selects each latent variable non-uniformly. Firstly, we show that this non-uniform scan Gibbs sampling leaves the target posterior distribution invariant. Then we explore how to determine the selection probability for latent variables. In particular, we construct an objective as a function of the selection probability and solve the constrained optimization problem. We further derive an analytic solution of the selection probability, which can be estimated easily. Our algorithm relies on the simple intuition that choosing the variable updates according to their marginal probabilities enhances the mixing time of the Markov chain. Finally, we validate the effectiveness of the proposed Gibbs sampler by conducting a set of experiments on real-world applications.
Analysis of child development facts and myths using text mining techniques and classification models
Tajrian, Mehedi, Rahman, Azizur, Kabir, Muhammad Ashad, Islam, Md Rafiqul
The rapid dissemination of misinformation on the internet complicates the decision-making process for individuals seeking reliable information, particularly parents researching child development topics. This misinformation can lead to adverse consequences, such as inappropriate treatment of children based on myths. While previous research has utilized text-mining techniques to predict child abuse cases, there has been a gap in the analysis of child development myths and facts. This study addresses this gap by applying text mining techniques and classification models to distinguish between myths and facts about child development, leveraging newly gathered data from publicly available websites. The research methodology involved several stages. First, text mining techniques were employed to pre-process the data, ensuring enhanced accuracy. Subsequently, the structured data was analysed using six robust Machine Learning (ML) classifiers and one Deep Learning (DL) model, with two feature extraction techniques applied to assess their performance across three different training-testing splits. To ensure the reliability of the results, cross-validation was performed using both k-fold and leave-one-out methods. Among the classification models tested, Logistic Regression (LR) demonstrated the highest accuracy, achieving a 90% accuracy with the Bag-of-Words (BoW) feature extraction technique. LR stands out for its exceptional speed and efficiency, maintaining low testing time per statement (0.97 microseconds). These findings suggest that LR, when combined with BoW, is effective in accurately classifying child development information, thus providing a valuable tool for combating misinformation and assisting parents in making informed decisions.
Can a Bayesian Oracle Prevent Harm from an Agent?
Bengio, Yoshua, Cohen, Michael K., Malkin, Nikolay, MacDermott, Matt, Fornasiere, Damiano, Greiner, Pietro, Kaddar, Younesse
Is there a way to design powerful AI systems based on machine learning methods that would satisfy probabilistic safety guarantees? With the long-term goal of obtaining a probabilistic guarantee that would apply in every context, we consider estimating a context-dependent bound on the probability of violating a given safety specification. Such a risk evaluation would need to be performed at run-time to provide a guardrail against dangerous actions of an AI. Noting that different plausible hypotheses about the world could produce very different outcomes, and because we do not know which one is right, we derive bounds on the safety violation probability predicted under the true but unknown hypothesis. Such bounds could be used to reject potentially dangerous actions. Our main results involve searching for cautious but plausible hypotheses, obtained by a maximization that involves Bayesian posteriors over hypotheses. We consider two forms of this result, in the iid case and in the non-iid case, and conclude with open problems towards turning such theoretical results into practical AI guardrails.
A Markovian Model for Learning-to-Optimize
We present a probabilistic model for stochastic iterative algorithms with the use case of optimization algorithms in mind. Based on this model, we present PAC-Bayesian generalization bounds for functions that are defined on the trajectory of the learned algorithm, for example, the expected (non-asymptotic) convergence rate and the expected time to reach the stopping criterion. Thus, not only does this model allow for learning stochastic algorithms based on their empirical performance, it also yields results about their actual convergence rate and their actual convergence time. We stress that, since the model is valid in a more general setting than learning-to-optimize, it is of interest for other fields of application, too. Finally, we conduct five practically relevant experiments, showing the validity of our claims.
Inference Plans for Hybrid Particle Filtering
Cheng, Ellie Y., Atkinson, Eric, Baudart, Guillaume, Mandel, Louis, Carbin, Michael
Advanced probabilistic programming languages (PPLs) use hybrid inference systems to combine symbolic exact inference and Monte Carlo methods to improve inference performance. These systems use heuristics to partition random variables within the program into variables that are encoded symbolically and variables that are encoded with sampled values, and the heuristics are not necessarily aligned with the performance evaluation metrics used by the developer. In this work, we present inference plans, a programming interface that enables developers to control the partitioning of random variables during hybrid particle filtering. We further present Siren, a new PPL that enables developers to use annotations to specify inference plans the inference system must implement. To assist developers with statically reasoning about whether an inference plan can be implemented, we present an abstract-interpretation-based static analysis for Siren for determining inference plan satisfiability. We prove the analysis is sound with respect to Siren's semantics. Our evaluation applies inference plans to three different hybrid particle filtering algorithms on a suite of benchmarks and shows that the control provided by inference plans enables speed ups of 1.76x on average and up to 206x to reach target accuracy, compared to the inference plans implemented by default heuristics; the results also show that inference plans improve accuracy by 1.83x on average and up to 595x with less or equal runtime, compared to the default inference plans. We further show that the static analysis is precise in practice, identifying all satisfiable inference plans in 27 out of the 33 benchmark-algorithm combinations.