Bayesian Learning
Reviews: Clone MCMC: Parallel High-Dimensional Gaussian Gibbs Sampling
This paper proposes a new parallel approximate sampler for high-dimensional Gaussian distributions. The algorithm is a special case of a larger class of iterative samplers based on a transition equation (2) and matrix splitting that is analysed in [9]. The algorithm is similar to the Hogwild sampler in term of the update formula and the way of bias analysing, but it is more flexible in the sense that there is a scalar parameter to trade-off the bias and variance of the proposed sampler. I appreciate the detailed introduction about the mathematical background of the family of sampling algorithms and related works. It is also easy to follow the paper and understand the merit of the proposed algorithm. The illustration of the decomposition of the variance and bias in Figure 1 gives a clear explanation about the role of \eta.
Reviews: Nonparametric learning from Bayesian models with randomized objective functions
The idea: You want to do Bayesian inference on a parameter theta, with prior pi(theta) and parametric likelihood f_theta, but you're not sure if the likelihood is correctly specified. So put a nonparametric prior on the sampling distribution: a mixture of Dirichlet processes centered at f_theta with mixing distribution pi(theta). The concentration parameter of the DP provides a sliding scale between vanilla Bayesian inference (total confidence in the parametric model) and Bayesian bootstrap (no confidence at all, use the empirical distribution). This is a simple idea, but the paper presents it lucidly and compellingly, beginning with a diverse list of potential applications: the method may be viewed as regularization of a nonparametric Bayesian model towards a parametric one; as robustification of a parametric Bayesian model to misspecification; as a means of correcting a variational approximation; or as nonparametric decision theory, when the log-likelihood is swapped out for an arbitrary utility function. As for implementation, the procedure requires (1) sampling from the parametric Bayesian posterior distribution and (2) performing a p-dimensional maximization, where p is the dimension of theta.
Reviews: Generalizing Tree Probability Estimation via Bayesian Networks
In this paper the authors propose an efficient method for tree probability estimation (given a collection of trees) that relies on the description of trees as subsplit Bayesian networks. Through this representation, the authors relax the classic conditional clade distribution - which assumes that given their parent, sister clades are independent - and assume instead that given their parent subsplit, sister subsplits are independent, thus allowing more dependence structure on sister clades. The authors first present a simple maximum likelihood estimation algorithm for rooted trees, and then propose two alternatives to generalize their work to unrooted trees. They finally illustrate their method on both simulated and real-data experiments. I think this paper is very well written, in particular I have greatly appreciated the Background and SBN description sections that make use of a simple though not trivial example to introduce new notions and provide useful insights on the assumptions.
A New Architecture for Neural Enhanced Multiobject Tracking
Wei, Shaoxiu, Liang, Mingchao, Meyer, Florian
Multiobject tracking (MOT) is an important task in robotics, autonomous driving, and maritime surveillance. Traditional work on MOT is model-based and aims to establish algorithms in the framework of sequential Bayesian estimation. More recent methods are fully data-driven and rely on the training of neural networks. The two approaches have demonstrated advantages in certain scenarios. In particular, in problems where plenty of labeled data for the training of neural networks is available, data-driven MOT tends to have advantages compared to traditional methods. A natural thought is whether a general and efficient framework can integrate the two approaches. This paper advances a recently introduced hybrid model-based and data-driven method called neural-enhanced belief propagation (NEBP). Compared to existing work on NEBP for MOT, it introduces a novel neural architecture that can improve data association and new object initialization, two critical aspects of MOT. The proposed tracking method is leading the nuScenes LiDAR-only tracking challenge at the time of submission of this paper.
Robust Domain Generalisation with Causal Invariant Bayesian Neural Networks
Gendron, Gaรซl, Witbrock, Michael, Dobbie, Gillian
Deep neural networks can obtain impressive performance on various tasks under the assumption that their training domain is identical to their target domain. Performance can drop dramatically when this assumption does not hold. One explanation for this discrepancy is the presence of spurious domain-specific correlations in the training data that the network exploits. Causal mechanisms, in the other hand, can be made invariant under distribution changes as they allow disentangling the factors of distribution underlying the data generation. Yet, learning causal mechanisms to improve out-of-distribution generalisation remains an under-explored area. We propose a Bayesian neural architecture that disentangles the learning of the the data distribution from the inference process mechanisms. We show theoretically and experimentally that our model approximates reasoning under causal interventions. We demonstrate the performance of our method, outperforming point estimate-counterparts, on out-of-distribution image recognition tasks where the data distribution acts as strong adversarial confounders.
Predicting Battery Capacity Fade Using Probabilistic Machine Learning Models With and Without Pre-Trained Priors
Kenney, Michael J., Malollari, Katerina G., Kalinin, Sergei V., Ziatdinov, Maxim
Lithium-ion batteries are a key energy storage technology driving revolutions in mobile electronics, electric vehicles and renewable energy storage. Capacity retention is a vital performance measure that is frequently utilized to assess whether these batteries have approached their end-of-life. Machine learning (ML) offers a powerful tool for predicting capacity degradation based on past data, and, potentially, prior physical knowledge, but the degree to which an ML prediction can be trusted is of significant practical importance in situations where consequential decisions must be made based on battery state of health. This study explores the efficacy of fully Bayesian machine learning in forecasting battery health with the quantification of uncertainty in its predictions. Specifically, we implemented three probabilistic ML approaches and evaluated the accuracy of their predictions and uncertainty estimates: a standard Gaussian process (GP), a structured Gaussian process (sGP), and a fully Bayesian neural network (BNN). In typical applications of GP and sGP, their hyperparameters are learned from a single sample while, in contrast, BNNs are typically pre-trained on an existing dataset to learn the weight distributions before being used for inference. This difference in methodology gives the BNN an advantage in learning global trends in a dataset and makes BNNs a good choice when training data is available. However, we show that pre-training can also be leveraged for GP and sGP approaches to learn the prior distributions of the hyperparameters and that in the case of the pre-trained sGP, similar accuracy and improved uncertainty estimation compared to the BNN can be achieved. This approach offers a framework for a broad range of probabilistic machine learning scenarios where past data is available and can be used to learn priors for (hyper)parameters of probabilistic ML models.
Harnessing the Power of Noise: A Survey of Techniques and Applications
Abdolazimi, Reyhaneh, Jin, Shengmin, Varshney, Pramod K., Zafarani, Reza
In Computer science and across various engineering fields, noise is often considered a nuisance and annoyance. It distorts details and makes data less accurate. In the past, the goal has often been to eliminate noise with the goal to make systems more reliable and accurate. But views on noise are changing. New findings suggest that noise can actually enhance and advance technologies in many areas, making us see it not just as a disruption but as a way to improve system performance. Thus, once unwanted and hard to control, noise now appears to be a key player in improving the performance of complex information processing systems [22]. This phenomena is often known as Stochastic Resonance, which helps clear up signals, improve image quality, and strengthen models in machine learning [7, 22, 101]. This duality of noise -- both a problem and a benefit -- highlights the tricky role of noise while optimizing advanced neural networks and machine learning models.
A Comparative Study of Hybrid Models in Health Misinformation Text Classification
Sikosana, Mkululi, Ajao, Oluwaseun, Maudsley-Barton, Sean
This study evaluates the effectiveness of machine learning (ML) and deep learning (DL) models in detecting COVID-19-related misinformation on online social networks (OSNs), aiming to develop more effective tools for countering the spread of health misinformation during the pan-demic. The study trained and tested various ML classifiers (Naive Bayes, SVM, Random Forest, etc.), DL models (CNN, LSTM, hybrid CNN+LSTM), and pretrained language models (DistilBERT, RoBERTa) on the "COVID19-FNIR DATASET". These models were evaluated for accuracy, F1 score, recall, precision, and ROC, and used preprocessing techniques like stemming and lemmatization. The results showed SVM performed well, achieving a 94.41% F1-score. DL models with Word2Vec embeddings exceeded 98% in all performance metrics (accuracy, F1 score, recall, precision & ROC). The CNN+LSTM hybrid models also exceeded 98% across performance metrics, outperforming pretrained models like DistilBERT and RoBERTa. Our study concludes that DL and hybrid DL models are more effective than conventional ML algorithms for detecting COVID-19 misinformation on OSNs. The findings highlight the importance of advanced neural network approaches and large-scale pretraining in misinformation detection. Future research should optimize these models for various misinformation types and adapt to changing OSNs, aiding in combating health misinformation.
Compositional Risk Minimization
Mahajan, Divyat, Pezeshki, Mohammad, Mitliagkas, Ioannis, Ahuja, Kartik, Vincent, Pascal
In this work, we tackle a challenging and extreme form of subpopulation shift, which is termed compositional shift. Under compositional shifts, some combinations of attributes are totally absent from the training distribution but present in the test distribution. We model the data with flexible additive energy distributions, where each energy term represents an attribute, and derive a simple alternative to empirical risk minimization termed compositional risk minimization (CRM). We first train an additive energy classifier to predict the multiple attributes and then adjust this classifier to tackle compositional shifts. We provide an extensive theoretical analysis of CRM, where we show that our proposal extrapolates to special affine hulls of seen attribute combinations. Empirical evaluations on benchmark datasets confirms the improved robustness of CRM compared to other methods from the literature designed to tackle various forms of subpopulation shifts.
Accelerated Preference Optimization for Large Language Model Alignment
He, Jiafan, Yuan, Huizhuo, Gu, Quanquan
Reinforcement Learning from Human Feedback (RLHF) has emerged as a pivotal tool for aligning large language models (LLMs) with human preferences. Direct Preference Optimization (DPO), one of the most popular approaches, formulates RLHF as a policy optimization problem without explicitly estimating the reward function. It overcomes the stability and efficiency issues of two-step approaches, which typically involve first estimating the reward function and then optimizing the policy via proximal policy optimization (PPO). Since RLHF is essentially an optimization problem, and it is well-known that momentum techniques can accelerate optimization both theoretically and empirically, a natural question arises: Can RLHF be accelerated by momentum? This paper answers this question in the affirmative. In detail, we first show that the iterative preference optimization method can be viewed as a proximal point method. Based on this observation, we propose a general Accelerated Preference Optimization (APO) framework, which unifies many existing preference optimization algorithms and employs Nesterov's momentum technique to speed up the alignment of LLMs. Theoretically, we demonstrate that APO can achieve a faster convergence rate than the standard iterative preference optimization methods, including DPO and Self-Play Preference Optimization (SPPO). Empirically, we show the superiority of APO over DPO, iterative DPO, and other strong baselines for RLHF on the AlpacaEval 2.0 benchmark.