Bayesian Learning
Structure learning with Temporal Gaussian Mixture for model-based Reinforcement Learning
Champion, Théophile, Grześ, Marek, Bowman, Howard
Model-based reinforcement learning refers to a set of approaches capable of sample-efficient decision making, which create an explicit model of the environment. This model can subsequently be used for learning optimal policies. In this paper, we propose a temporal Gaussian Mixture Model composed of a perception model and a transition model. The perception model extracts discrete (latent) states from continuous observations using a variational Gaussian mixture likelihood. Importantly, our model constantly monitors the collected data searching for new Gaussian components, i.e., the perception model performs a form of structure learning (Smith et al., 2020; Friston et al., 2018; Neacsu et al., 2022) as it learns the number of Gaussian components in the mixture. Additionally, the transition model learns the temporal transition between consecutive time steps by taking advantage of the Dirichlet-categorical conjugacy. Both the perception and transition models are able to forget part of the data points, while integrating the information they provide within the prior, which ensure fast variational inference. Finally, decision making is performed with a variant of Q-learning which is able to learn Q-values from beliefs over states. Empirically, we have demonstrated the model's ability to learn the structure of several mazes: the model discovered the number of states and the transition probabilities between these states. Moreover, using its learned Q-values, the agent was able to successfully navigate from the starting position to the maze's exit.
BONE: a unifying framework for Bayesian online learning in non-stationary environments
Duran-Martin, Gerardo, Sánchez-Betancourt, Leandro, Shestopaloff, Alexander Y., Murphy, Kevin
We propose a unifying framework for methods that perform Bayesian online learning in non-stationary environments. We call the framework BONE, which stands for (B)ayesian (O)nline learning in (N)on-stationary (E)nvironments. BONE provides a common structure to tackle a variety of problems, including online continual learning, prequential forecasting, and contextual bandits. The framework requires specifying three modelling choices: (i) a model for measurements (e.g., a neural network), (ii) an auxiliary process to model non-stationarity (e.g., the time since the last changepoint), and (iii) a conditional prior over model parameters (e.g., a multivariate Gaussian). The framework also requires two algorithmic choices, which we use to carry out approximate inference under this framework: (i) an algorithm to estimate beliefs (posterior distribution) about the model parameters given the auxiliary variable, and (ii) an algorithm to estimate beliefs about the auxiliary variable. We show how this modularity allows us to write many different existing methods as instances of BONE; we also use this framework to propose a new method. We then experimentally compare existing methods with our proposed new method on several datasets; we provide insights into the situations that make one method more suitable than another for a given task.
Making Sigmoid-MSE Great Again: Output Reset Challenges Softmax Cross-Entropy in Neural Network Classification
Tyagi, Kanishka, Rane, Chinmay, Vaidya, Ketaki, Challgundla, Jeshwanth, Auddy, Soumitro Swapan, Manry, Michael
This study presents a comparative analysis of two objective functions, Mean Squared Error (MSE) and Softmax Cross-Entropy (SCE) for neural network classification tasks. While SCE combined with softmax activation is the conventional choice for transforming network outputs into class probabilities, we explore an alternative approach using MSE with sigmoid activation. We introduce the Output Reset algorithm, which reduces inconsistent errors and enhances classifier robustness. Through extensive experiments on benchmark datasets (MNIST, CIFAR-10, and Fashion-MNIST), we demonstrate that MSE with sigmoid activation achieves comparable accuracy and convergence rates to SCE, while exhibiting superior performance in scenarios with noisy data. Our findings indicate that MSE, despite its traditional association with regression tasks, serves as a viable alternative for classification problems, challenging conventional wisdom about neural network training strategies.
Emotion Detection in Reddit: Comparative Study of Machine Learning and Deep Learning Techniques
Emotion detection is pivotal in human communication, as it significantly influences behavior, relationships, and decision-making processes. This study concentrates on text-based emotion detection by leveraging the GoEmotions dataset, which annotates Reddit comments with 27 distinct emotions. These emotions are subsequently mapped to Ekman's 6 basic categories: joy, anger, fear, sadness, disgust, and surprise. We employed a range of models for this task, including 6 machine learning models, 3 ensemble models, and Long Short-Term Memory (LSTM) model to determine the optimal model for emotion detection. Results indicate that the Stacking classifier outperforms other models in accuracy and performance. Finally, the Stacking classifier is deployed via a Streamlit web application, underscoring its potential for real-world applications in text-based emotion analysis. Keywords: Text Based Emotion Detection, Machine Learning, Ensemble Learning, Deep Learning, GoEmotions, EmoBERTa, Streamlit Introduction Emotions are complex, subjective experiences, often linked to psychological states such as mood, temperament, and personality. These experiences influence human behavior, impacting decision-making, reactions to stimuli, and interpersonal interactions. In the contemporary world, where mental health disorders such as stress, anxiety, and depression are increasingly prevalent, understanding emotions is more important than ever (Maruf et al., 2024).
Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data
Helli, Kai, Schnurr, David, Hollmann, Noah, Müller, Samuel, Hutter, Frank
While most ML models expect independent and identically distributed data, this assumption is often violated in real-world scenarios due to distribution shifts, resulting in the degradation of machine learning model performance. Until now, no tabular method has consistently outperformed classical supervised learning, which ignores these shifts. To address temporal distribution shifts, we present Drift-Resilient TabPFN, a fresh approach based on In-Context Learning with a Prior-Data Fitted Network that learns the learning algorithm itself: it accepts the entire training dataset as input and makes predictions on the test set in a single forward pass. Specifically, it learns to approximate Bayesian inference on synthetic datasets drawn from a prior that specifies the model's inductive bias. This prior is based on structural causal models (SCM), which gradually shift over time. To model shifts of these causal models, we use a secondary SCM, that specifies changes in the primary model parameters. The resulting Drift-Resilient TabPFN can be applied to unseen data, runs in seconds on small to moderately sized datasets and needs no hyperparameter tuning. Comprehensive evaluations across 18 synthetic and real-world datasets demonstrate large performance improvements over a wide range of baselines, such as XGB, CatBoost, TabPFN, and applicable methods featured in the Wild-Time benchmark. Compared to the strongest baselines, it improves accuracy from 0.688 to 0.744 and ROC AUC from 0.786 to 0.832 while maintaining stronger calibration. This approach could serve as significant groundwork for further research on out-of-distribution prediction.
Model Inversion Attacks: A Survey of Approaches and Countermeasures
Zhou, Zhanke, Zhu, Jianing, Yu, Fengfei, Li, Xuan, Peng, Xiong, Liu, Tongliang, Han, Bo
The success of deep neural networks has driven numerous research studies and applications from Euclidean to non-Euclidean data. However, there are increasing concerns about privacy leakage, as these networks rely on processing private data. Recently, a new type of privacy attack, the model inversion attacks (MIAs), aims to extract sensitive features of private data for training by abusing access to a well-trained model. The effectiveness of MIAs has been demonstrated in various domains, including images, texts, and graphs. These attacks highlight the vulnerability of neural networks and raise awareness about the risk of privacy leakage within the research community. Despite the significance, there is a lack of systematic studies that provide a comprehensive overview and deeper insights into MIAs across different domains. This survey aims to summarize up-to-date MIA methods in both attacks and defenses, highlighting their contributions and limitations, underlying modeling principles, optimization challenges, and future directions. We hope this survey bridges the gap in the literature and facilitates future research in this critical area. Besides, we are maintaining a repository to keep track of relevant research at https://github.com/AndrewZhou924/Awesome-model-inversion-attack.
Continuous Bayesian Model Selection for Multivariate Causal Discovery
Dhir, Anish, Sedgwick, Ruby, Kori, Avinash, Glocker, Ben, van der Wilk, Mark
Current causal discovery approaches require restrictive model assumptions or assume access to interventional data to ensure structure identifiability. These assumptions often do not hold in real-world applications leading to a loss of guarantees and poor accuracy in practice. Recent work has shown that, in the bivariate case, Bayesian model selection can greatly improve accuracy by exchanging restrictive modelling for more flexible assumptions, at the cost of a small probability of error. We extend the Bayesian model selection approach to the important multivariate setting by making the large discrete selection problem scalable through a continuous relaxation. We demonstrate how for our choice of Bayesian non-parametric model, the Causal Gaussian Process Conditional Density Estimator (CGP-CDE), an adjacency matrix can be constructed from the model hyperparameters. This adjacency matrix is then optimised using the marginal likelihood and an acyclicity regulariser, outputting the maximum a posteriori causal graph. We demonstrate the competitiveness of our approach on both synthetic and real-world datasets, showing it is possible to perform multivariate causal discovery without infeasible assumptions using Bayesian model selection.
Machine learning-enabled velocity model building with uncertainty quantification
Orozco, Rafael, Erdinc, Huseyin Tuna, Zeng, Yunlin, Louboutin, Mathias, Herrmann, Felix J.
Accurately characterizing migration velocity models is crucial for a wide range of geophysical applications, from hydrocarbon exploration to monitoring of CO2 sequestration projects. Traditional velocity model building methods such as Full-Waveform Inversion (FWI) are powerful but often struggle with the inherent complexities of the inverse problem, including noise, limited bandwidth, receiver aperture and computational constraints. To address these challenges, we propose a scalable methodology that integrates generative modeling, in the form of Diffusion networks, with physics-informed summary statistics, making it suitable for complicated imaging problems including field datasets. By defining these summary statistics in terms of subsurface-offset image volumes for poor initial velocity models, our approach allows for computationally efficient generation of Bayesian posterior samples for migration velocity models that offer a useful assessment of uncertainty. To validate our approach, we introduce a battery of tests that measure the quality of the inferred velocity models, as well as the quality of the inferred uncertainties. With modern synthetic datasets, we reconfirm gains from using subsurface-image gathers as the conditioning observable. For complex velocity model building involving salt, we propose a new iterative workflow that refines amortized posterior approximations with salt flooding and demonstrate how the uncertainty in the velocity model can be propagated to the final product reverse time migrated images. Finally, we present a proof of concept on field datasets to show that our method can scale to industry-sized problems.
Modeling human decomposition: a Bayesian approach
Smith, D. Hudson, Nisbet, Noah, Ehrett, Carl, Tica, Cristina I., Atwell, Madeline M., Weisensee, Katherine E.
Environmental and individualistic variables affect the rate of human decomposition in complex ways. These effects complicate the estimation of the postmortem interval (PMI) based on observed decomposition characteristics. In this work, we develop a generative probabilistic model for decomposing human remains based on PMI and a wide range of environmental and individualistic variables. This model explicitly represents the effect of each variable, including PMI, on the appearance of each decomposition characteristic, allowing for direct interpretation of model effects and enabling the use of the model for PMI inference and optimal experimental design. In addition, the probabilistic nature of the model allows for the integration of expert knowledge in the form of prior distributions. We fit this model to a diverse set of 2,529 cases from the GeoFOR dataset. We demonstrate that the model accurately predicts 24 decomposition characteristics with an ROC AUC score of 0.85. Using Bayesian inference techniques, we invert the decomposition model to predict PMI as a function of the observed decomposition characteristics and environmental and individualistic variables, producing an R-squared measure of 71%. Finally, we demonstrate how to use the fitted model to design future experiments that maximize the expected amount of new information about the mechanisms of decomposition using the Expected Information Gain formalism.
SureMap: Simultaneous Mean Estimation for Single-Task and Multi-Task Disaggregated Evaluation
Khodak, Mikhail, Mackey, Lester, Chouldechova, Alexandra, Dudík, Miroslav
Disaggregated evaluation -- estimation of performance of a machine learning model on different subpopulations -- is a core task when assessing performance and group-fairness of AI systems. A key challenge is that evaluation data is scarce, and subpopulations arising from intersections of attributes (e.g., race, sex, age) are often tiny. Today, it is common for multiple clients to procure the same AI model from a model developer, and the task of disaggregated evaluation is faced by each customer individually. This gives rise to what we call the multi-task disaggregated evaluation problem, wherein multiple clients seek to conduct a disaggregated evaluation of a given model in their own data setting (task). In this work we develop a disaggregated evaluation method called SureMap that has high estimation accuracy for both multi-task and single-task disaggregated evaluations of blackbox models. SureMap's efficiency gains come from (1) transforming the problem into structured simultaneous Gaussian mean estimation and (2) incorporating external data, e.g., from the AI system creator or from their other clients. Our method combines maximum a posteriori (MAP) estimation using a well-chosen prior together with cross-validation-free tuning via Stein's unbiased risk estimate (SURE). We evaluate SureMap on disaggregated evaluation tasks in multiple domains, observing significant accuracy improvements over several strong competitors.