Bayesian Learning
Bayesian posterior approximation with stochastic ensembles
Balabanov, Oleksandr, Mehlig, Bernhard, Linander, Hampus
To further reduce the computational effort in evaluating the approximate posterior, stochastic methods such as We introduce ensembles of stochastic neural networks to Monte Carlo dropout [47] and DropConnect [51] inference approximate the Bayesian posterior, combining stochastic have also been used extensively [13, 14, 40]. They benefit methods such as dropout with deep ensembles. The stochastic from computationally cheaper inference by virtue of sampling ensembles are formulated as families of distributions and stochastically from a single model. Formulated as a trained to approximate the Bayesian posterior with variational variational approximation to the posterior, dropout samples inference. We implement stochastic ensembles based from a family of parameter distributions where parameters on Monte Carlo dropout, DropConnect and a novel nonparametric can be randomly set to zero. Although this particular family version of dropout and evaluate them on a toy of distributions might seem unnatural [11], it turns out problem and CIFAR image classification. For both tasks, that the stochastic property can help to find more robust regions we test the quality of the posteriors directly against Hamiltonian of the parameter space, a fact well-known from a long Monte Carlo simulations. Our results show that history of using dropout as a regularization method.
A unified recipe for deriving (time-uniform) PAC-Bayes bounds
Chugg, Ben, Wang, Hongjian, Ramdas, Aaditya
We present a unified framework for deriving PAC-Bayesian generalization bounds. Unlike most previous literature on this topic, our bounds are anytime-valid (i.e., time-uniform), meaning that they hold at all stopping times, not only for a fixed sample size. Our approach combines four tools in the following order: (a) nonnegative supermartingales or reverse submartingales, (b) the method of mixtures, (c) the Donsker-Varadhan formula (or other convex duality principles), and (d) Ville's inequality. Our main result is a PAC-Bayes theorem which holds for a wide class of discrete stochastic processes. We show how this result implies time-uniform versions of well-known classical PAC-Bayes bounds, such as those of Seeger, McAllester, Maurer, and Catoni, in addition to many recent bounds. We also present several novel bounds. Our framework also enables us to relax traditional assumptions; in particular, we consider nonstationary loss functions and non-i.i.d. data. In sum, we unify the derivation of past bounds and ease the search for future bounds: one may simply check if our supermartingale or submartingale conditions are met and, if so, be guaranteed a (time-uniform) PAC-Bayes bound.
SwitchTab: Switched Autoencoders Are Effective Tabular Learners
Wu, Jing, Chen, Suiyao, Zhao, Qi, Sergazinov, Renat, Li, Chen, Liu, Shengjie, Zhao, Chongchao, Xie, Tianpei, Guo, Hanqing, Ji, Cheng, Cociorva, Daniel, Brunzel, Hakan
Self-supervised representation learning methods have achieved significant success in computer vision and natural language processing, where data samples exhibit explicit spatial or semantic dependencies. However, applying these methods to tabular data is challenging due to the less pronounced dependencies among data samples. In this paper, we address this limitation by introducing SwitchTab, a novel self-supervised method specifically designed to capture latent dependencies in tabular data. SwitchTab leverages an asymmetric encoder-decoder framework to decouple mutual and salient features among data pairs, resulting in more representative embeddings. These embeddings, in turn, contribute to better decision boundaries and lead to improved results in downstream tasks. To validate the effectiveness of SwitchTab, we conduct extensive experiments across various domains involving tabular data. The results showcase superior performance in end-to-end prediction tasks with fine-tuning. Moreover, we demonstrate that pre-trained salient embeddings can be utilized as plug-and-play features to enhance the performance of various traditional classification methods (e.g., Logistic Regression, XGBoost, etc.). Lastly, we highlight the capability of SwitchTab to create explainable representations through visualization of decoupled mutual and salient features in the latent space.
On the hierarchical Bayesian modelling of frequency response functions
Dardeno, T. A., Worden, K., Dervilis, N., Mills, R. S., Bull, L. A.
For situations that may benefit from information sharing among datasets, e.g., population-based SHM of similar structures, the hierarchical Bayesian approach provides a useful modelling structure. Hierarchical Bayesian models learn statistical distributions at the population (or parent) and the domain levels simultaneously, to bolster statistical strength among the parameters. As a result, variance is reduced among the parameter estimates, particularly when data are limited. In this paper, a combined probabilistic FRF model is developed for a small population of nominally-identical helicopter blades, using a hierarchical Bayesian structure, to support information transfer in the context of sparse data. The modelling approach is also demonstrated in a traditional SHM context, for a single helicopter blade exposed to varying temperatures, to show how the inclusion of physics-based knowledge can improve generalisation beyond the training data, in the context of scarce data. These models address critical challenges in SHM, by accommodating benign variations that present as differences in the underlying dynamics, while also considering (and utilising), the similarities among the domains.
SLEM: Machine Learning for Path Modeling and Causal Inference with Super Learner Equation Modeling
Causal inference is a crucial goal of science, enabling researchers to arrive at meaningful conclusions regarding the predictions of hypothetical interventions using observational data. Path models, Structural Equation Models (SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to unambiguously specify assumptions regarding the causal structure underlying a phenomenon. Unlike DAGs, which make very few assumptions about the functional and parametric form, SEM assumes linearity. This can result in functional misspecification which prevents researchers from undertaking reliable effect size estimation. In contrast, we propose Super Learner Equation Modeling, a path modeling technique integrating machine learning Super Learner ensembles. We empirically demonstrate its ability to provide consistent and unbiased estimates of causal effects, its competitive performance for linear models when compared with SEM, and highlight its superiority over SEM when dealing with non-linear relationships. We provide open-source code, and a tutorial notebook with example usage, accentuating the easy-to-use nature of the method.
PAC-Bayesian Domain Adaptation Bounds for Multi-view learning
Hennequin, Mehdi, Benabdeslem, Khalid, Elghazel, Haytham
This paper presents a series of new results for domain adaptation in the multi-view learning setting. The incorporati on of multiple views in the domain adaptation was paid little attention in t he previous studies. In this way, we propose an analysis of generaliz ation bounds with Pac-Bayesian theory to consolidate the two paradigms, which are currently treated separately. Firstly, building on previo us work by Ger-main et al. [7,8], we adapt the distance between distributio n proposed by Germain et al. for domain adaptation with the concept of mu lti-view learning. Thus, we introduce a novel distance that is ta ilored for the multi-view domain adaptation setting. Then, we give Pac -Bayesian bounds for estimating the introduced divergence. Finally, we compare the different new bounds with the previous studies.
Data-driven Modeling and Inference for Bayesian Gaussian Process ODEs via Double Normalizing Flows
Xu, Jian, Du, Shian, Yang, Junmei, Ding, Xinghao, Paisley, John, Zeng, Delu
Recently, Gaussian processes have been used to model the vector field of continuous dynamical systems, referred to as GPODEs, which are characterized by a probabilistic ODE equation. Bayesian inference for these models has been extensively studied and applied in tasks such as time series prediction. However, the use of standard GPs with basic kernels like squared exponential kernels has been common in GPODE research, limiting the model's ability to represent complex scenarios. To address this limitation, we introduce normalizing flows to reparameterize the ODE vector field, resulting in a data-driven prior distribution, thereby increasing flexibility and expressive power. We develop a data-driven variational learning algorithm that utilizes analytically tractable probability density functions of normalizing flows, enabling simultaneous learning and inference of unknown continuous dynamics. Additionally, we also apply normalizing flows to the posterior inference of GP ODEs to resolve the issue of strong mean-field assumptions in posterior inference. By applying normalizing flows in both these ways, our model improves accuracy and uncertainty estimates for Bayesian Gaussian Process ODEs. We validate the effectiveness of our approach on simulated dynamical systems and real-world human motion data, including time series prediction and missing data recovery tasks. Experimental results show that our proposed method effectively captures model uncertainty while improving accuracy.
PAC-Bayes-Chernoff bounds for unbounded losses
Casado, Ioar, Ortega, Luis A., Masegosa, Andrés R., Pérez, Aritz
We present a new high-probability PAC-Bayes oracle bound for unbounded losses. This result can be understood as a PAC-Bayes version of the Chernoff bound. The proof technique relies on uniformly bounding the tail of certain random variable based on the Cram\'er transform of the loss. We highlight two applications of our main result. First, we show that our bound solves the open problem of optimizing the free parameter on many PAC-Bayes bounds. Finally, we show that our approach allows working with flexible assumptions on the loss function, resulting in novel bounds that generalize previous ones and can be minimized to obtain Gibbs-like posteriors.
Whole-examination AI estimation of fetal biometrics from 20-week ultrasound scans
Venturini, Lorenzo, Budd, Samuel, Farruggia, Alfonso, Wright, Robert, Matthew, Jacqueline, Day, Thomas G., Kainz, Bernhard, Razavi, Reza, Hajnal, Jo V.
The current approach to fetal anomaly screening is based on biometric measurements derived from individually selected ultrasound images. In this paper, we introduce a paradigm shift that attains human-level performance in biometric measurement by aggregating automatically extracted biometrics from every frame across an entire scan, with no need for operator intervention. We use a convolutional neural network to classify each frame of an ultrasound video recording. We then measure fetal biometrics in every frame where appropriate anatomy is visible. We use a Bayesian method to estimate the true value of each biometric from a large number of measurements and probabilistically reject outliers. We performed a retrospective experiment on 1457 recordings (comprising 48 million frames) of 20-week ultrasound scans, estimated fetal biometrics in those scans and compared our estimates to the measurements sonographers took during the scan. Our method achieves human-level performance in estimating fetal biometrics and estimates well-calibrated credible intervals in which the true biometric value is expected to lie.
AI Alignment: A Comprehensive Survey
Ji, Jiaming, Qiu, Tianyi, Chen, Boyuan, Zhang, Borong, Lou, Hantao, Wang, Kaile, Duan, Yawen, He, Zhonghao, Zhou, Jiayi, Zhang, Zhaowei, Zeng, Fanzhi, Ng, Kwan Yee, Dai, Juntao, Pan, Xuehai, O'Gara, Aidan, Lei, Yingshan, Xu, Hua, Tse, Brian, Fu, Jie, McAleer, Stephen, Yang, Yaodong, Wang, Yizhou, Zhu, Song-Chun, Guo, Yike, Gao, Wen
AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey, we delve into the core concepts, methodology, and practice of alignment. First, we identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality (RICE). Guided by these four principles, we outline the landscape of current alignment research and decompose them into two key components: forward alignment and backward alignment. The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks. On forward alignment, we discuss techniques for learning from feedback and learning under distribution shift. On backward alignment, we discuss assurance techniques and governance practices. We also release and continually update the website (www.alignmentsurvey.com) which features tutorials, collections of papers, blog posts, and other resources.