Bayesian Learning
Predicting Human Depression with Hybrid Data Acquisition utilizing Physical Activity Sensing and Social Media Feeds
Uddin, Mohammad Helal, Baidya, Sabur
Mental disorders including depression, anxiety, and other neurological disorders pose a significant global challenge, particularly among individuals exhibiting social avoidance tendencies. This study proposes a hybrid approach by leveraging smartphone sensor data measuring daily physical activities and analyzing their social media (Twitter) interactions for evaluating an individual's depression level. Using CNN-based deep learning models and Naive Bayes classification, we identify human physical activities accurately and also classify the user sentiments. A total of 33 participants were recruited for data acquisition, and nine relevant features were extracted from the physical activities and analyzed with their weekly depression scores, evaluated using the Geriatric Depression Scale (GDS) questionnaire. Of the nine features, six are derived from physical activities, achieving an activity recognition accuracy of 95%, while three features stem from sentiment analysis of Twitter activities, yielding a sentiment analysis accuracy of 95.6%. Notably, several physical activity features exhibited significant correlations with the severity of depression symptoms. For classifying the depression severity, a support vector machine (SVM)-based algorithm is employed that demonstrated a very high accuracy of 94%, outperforming alternative models, e.g., the multilayer perceptron (MLP) and k-nearest neighbor. It is a simple approach yet highly effective in the long run for monitoring depression without breaching personal privacy.
Self-orthogonalizing attractor neural networks emerging from the free energy principle
Attractor dynamics are a hallmark of many complex systems, including the brain. Understanding how such self-organizing dynamics emerge from first principles is crucial for advancing our understanding of neuronal computations and the design of artificial intelligence systems. Here we formalize how attractor networks emerge from the free energy principle applied to a universal partitioning of random dynamical systems. Our approach obviates the need for explicitly imposed learning and inference rules and identifies emergent, but efficient and biologically plausible inference and learning dynamics for such self-organizing systems. These result in a collective, multi-level Bayesian active inference process. Attractors on the free energy landscape encode prior beliefs; inference integrates sensory data into posterior beliefs; and learning fine-tunes couplings to minimize long-term surprise. Analytically and via simulations, we establish that the proposed networks favor approximately orthogonalized attractor representations, a consequence of simultaneously optimizing predictive accuracy and model complexity. These attractors efficiently span the input subspace, enhancing generalization and the mutual information between hidden causes and observable effects. Furthermore, while random data presentation leads to symmetric and sparse couplings, sequential data fosters asymmetric couplings and non-equilibrium steady-state dynamics, offering a natural extension to conventional Boltzmann Machines. Our findings offer a unifying theory of self-organizing attractor networks, providing novel insights for AI and neuroscience.
Optimizing Data Augmentation through Bayesian Model Selection
Matymov, Madi, Tran, Ba-Hien, Kampffmeyer, Michael, Heinonen, Markus, Filippone, Maurizio
Data Augmentation (DA) has become an essential tool to improve robustness and generalization of modern machine learning. However, when deciding on DA strategies it is critical to choose parameters carefully, and this can be a daunting task which is traditionally left to trial-and-error or expensive optimization based on validation performance. In this paper, we counter these limitations by proposing a novel framework for optimizing DA. In particular, we take a probabilistic view of DA, which leads to the interpretation of augmentation parameters as model (hyper)-parameters, and the optimization of the marginal likelihood with respect to these parameters as a Bayesian model selection problem. Due to its intractability, we derive a tractable Evidence Lower BOund (ELBO), which allows us to optimize augmentation parameters jointly with model parameters. We provide extensive theoretical results on variational approximation quality, generalization guarantees, invariance properties, and connections to empirical Bayes. Through experiments on computer vision tasks, we show that our approach improves calibration and yields robust performance over fixed or no augmentation. Our work provides a rigorous foundation for optimizing DA through Bayesian principles with significant potential for robust machine learning.
Credal Prediction based on Relative Likelihood
Lรถhr, Timo, Hofman, Paul, Mohr, Felix, Hรผllermeier, Eyke
Predictions in the form of sets of probability distributions, so-called credal sets, provide a suitable means to represent a learner's epistemic uncertainty. In this paper, we propose a theoretically grounded approach to credal prediction based on the statistical notion of relative likelihood: The target of prediction is the set of all (conditional) probability distributions produced by the collection of plausible models, namely those models whose relative likelihood exceeds a specified threshold. This threshold has an intuitive interpretation and allows for controlling the trade-off between correctness and precision of credal predictions. We tackle the problem of approximating credal sets defined in this way by means of suitably modified ensemble learning techniques. To validate our approach, we illustrate its effectiveness by experiments on benchmark datasets demonstrating superior uncertainty representation without compromising predictive performance. We also compare our method against several state-of-the-art baselines in credal prediction.
Uncertainty Quantification with Proper Scoring Rules: Adjusting Measures to Prediction Tasks
Hofman, Paul, Sale, Yusuf, Hรผllermeier, Eyke
We address the problem of uncertainty quantification and propose measures of total, aleatoric, and epistemic uncertainty based on a known decomposition of (strictly) proper scoring rules, a specific type of loss function, into a divergence and an entropy component. This leads to a flexible framework for uncertainty quantification that can be instantiated with different losses (scoring rules), which makes it possible to tailor uncertainty quantification to the use case at hand. We show that this flexibility is indeed advantageous. In particular, we analyze the task of selective prediction and show that the scoring rule should ideally match the task loss. In addition, we perform experiments on two other common tasks. For out-of-distribution detection, our results confirm that a widely used measure of epistemic uncertainty, mutual information, performs best. Moreover, in the setting of active learning, our measure of epistemic uncertainty based on the zero-one-loss consistently outperforms other uncertainty measures.
Handling bounded response in high dimensions: a Horseshoe prior Bayesian Beta regression approach
Bounded continuous responses -- such as proportions -- arise frequently in diverse scientific fields including climatology, biostatistics, and finance. Beta regression is a widely adopted framework for modeling such data, due to the flexibility of the Beta distribution over the unit interval. While Bayesian extensions of Beta regression have shown promise, existing methods are limited to low-dimensional settings and lack theoretical guarantees. In this work, we propose a novel Bayesian approach for high-dimensional sparse Beta regression framework that employs a tempered posterior. Our method incorporates the Horseshoe prior for effective shrinkage and variable selection. Most notable, we propose a novel Gibbs sampling algorithm using Pรณlya-Gamma augmentation for efficient inference in Beta regression model. We also provide the first theoretical results establishing posterior consistency and convergence rates for Bayesian Beta regression. Through extensive simulation studies in both low- and high-dimensional scenarios, we demonstrate that our approach outperforms existing alternatives, offering improved estimation accuracy and model interpretability. Our method is implemented in the R package ``betaregbayes" available on Github.
Principled Out-of-Distribution Generalization via Simplicity
Ge, Jiawei, Wang, Amanda, Tang, Shange, Jin, Chi
Modern foundation models exhibit remarkable out-of-distribution (OOD) generalization, solving tasks far beyond the support of their training data. However, the theoretical principles underpinning this phenomenon remain elusive. This paper investigates this problem by examining the compositional generalization abilities of diffusion models in image generation. Our analysis reveals that while neural network architectures are expressive enough to represent a wide range of models -- including many with undesirable behavior on OOD inputs -- the true, generalizable model that aligns with human expectations typically corresponds to the simplest among those consistent with the training data. Motivated by this observation, we develop a theoretical framework for OOD generalization via simplicity, quantified using a predefined simplicity metric. We analyze two key regimes: (1) the constant-gap setting, where the true model is strictly simpler than all spurious alternatives by a fixed gap, and (2) the vanishing-gap setting, where the fixed gap is replaced by a smoothness condition ensuring that models close in simplicity to the true model yield similar predictions. For both regimes, we study the regularized maximum likelihood estimator and establish the first sharp sample complexity guarantees for learning the true, generalizable, simple model.
Revisiting Bayesian Model Averaging in the Era of Foundation Models
We revisit the classical, full-fledged Bayesian model averaging (BMA) paradigm to ensemble pre-trained and/or lightly-finetuned foundation models to enhance the classification performance on image and text data. To make BMA tractable under foundation models, we introduce trainable linear classifiers that take frozen features from the pre-trained foundation models as inputs. The model posteriors over the linear classifiers tell us which linear heads and frozen features are better suited for a given dataset, resulting in a principled model ensembling method. Furthermore, we propose a computationally cheaper, optimizable model averaging scheme (OMA). In OMA, we directly optimize the model ensemble weights, just like those weights based on model posterior distributions in BMA, by reducing the amount of surprise (expected entropy of the predictions) we get from predictions of ensembled models. With the rapid development of foundation models, these approaches will enable the incorporation of future, possibly significantly better foundation models to enhance the performance of challenging classification tasks.
Identifying Causal Direction via Variational Bayesian Compression
Tran, Quang-Duy, Duong, Bao, Nguyen, Phuoc, Nguyen, Thin
Telling apart the cause and effect between two random variables with purely observational data is a challenging problem that finds applications in various scientific disciplines. A key principle utilized in this task is the algorithmic Markov condition, which postulates that the joint distribution, when factorized according to the causal direction, yields a more succinct codelength compared to the anti-causal direction. Previous approaches approximate these codelengths by relying on simple functions or Gaussian processes (GPs) with easily evaluable complexity, compromising between model fitness and computational complexity. To overcome these limitations, we propose leveraging the variational Bayesian learning of neural networks as an interpretation of the codelengths. Consequently, we can enhance the model fitness while promoting the succinctness of the codelengths, while avoiding the significant computational complexity of the GP-based approaches. Extensive experiments on both synthetic and real-world benchmarks in cause-effect identification demonstrate the effectiveness of our proposed method, surpassing the overall performance of related complexity-based and structural causal model regression-based approaches.
A Provable Approach for End-to-End Safe Reinforcement Learning
Wachi, Akifumi, Miyaguchi, Kohei, Tanabe, Takumi, Sato, Rei, Akimoto, Youhei
A longstanding goal in safe reinforcement learning (RL) is a method to ensure the safety of a policy throughout the entire process, from learning to operation. However, existing safe RL paradigms inherently struggle to achieve this objective. We propose a method, called Provably Lifetime Safe RL (PLS), that integrates offline safe RL with safe policy deployment to address this challenge. Our proposed method learns a policy offline using return-conditioned supervised learning and then deploys the resulting policy while cautiously optimizing a limited set of parameters, known as target returns, using Gaussian processes (GPs). Theoretically, we justify the use of GPs by analyzing the mathematical relationship between target and actual returns. We then prove that PLS finds near-optimal target returns while guaranteeing safety with high probability. Empirically, we demonstrate that PLS outperforms baselines both in safety and reward performance, thereby achieving the longstanding goal to obtain high rewards while ensuring the safety of a policy throughout the lifetime from learning to operation.