Goto

Collaborating Authors

 Bayesian Inference


A probabilistic analysis of selected notions of iterated conditioning under coherence

arXiv.org Artificial Intelligence

It is well know that basic conditionals satisfy some desirable basic logical and probabilistic properties, such as the compound probability theorem, but checking the validity of these becomes trickier when we switch to compound and iterated conditionals. We consider de Finetti's notion of conditional as a three-valued object and as a conditional random quantity in the betting framework. We recall the notions of conjunction and disjunction among conditionals in selected trivalent logics. First, in the framework of specific three-valued logics we analyze the notions of iterated conditioning introduced by Cooper-Calabrese, de Finetti and Farrell, respectively. We show that the compound probability theorem and other basic properties are not preserved by these objects, by also computing some probability propagation rules. Then, for each trivalent logic we introduce an iterated conditional as a suitable random quantity which satisfies the compound prevision theorem and some of the desirable properties. We also check the validity of two generalized versions of Bayes' Rule for iterated conditionals. We study the p-validity of generalized versions of Modus Ponens and two-premise centering for iterated conditionals. Finally, we observe that all the basic properties are satisfied only by the iterated conditional mainly developed in recent papers by Gilio and Sanfilippo in the setting of conditional random quantities.


Bayesian Prompt Learning for Image-Language Model Generalization

arXiv.org Artificial Intelligence

Foundational image-language models have generated considerable interest due to their efficient adaptation to downstream tasks by prompt learning. Prompt learning treats part of the language model input as trainable while freezing the rest, and optimizes an Empirical Risk Minimization objective. However, Empirical Risk Minimization is known to suffer from distributional shifts which hurt generalizability to prompts unseen during training. By leveraging the regularization ability of Bayesian methods, we frame prompt learning from the Bayesian perspective and formulate it as a variational inference problem. Our approach regularizes the prompt space, reduces overfitting to the seen prompts and improves the prompt generalization on unseen prompts. Our framework is implemented by modeling the input prompt space in a probabilistic manner, as an a priori distribution which makes our proposal compatible with prompt learning approaches that are unconditional or conditional on the image. We demonstrate empirically on 15 benchmarks that Bayesian prompt learning provides an appropriate coverage of the prompt space, prevents learning spurious features, and exploits transferable invariant features. This results in better generalization of unseen prompts, even across different datasets and domains. Code available at: https://github.com/saic-fi/Bayesian-Prompt-Learning


Deep Generative Modeling-based Data Augmentation with Demonstration using the BFBT Benchmark Void Fraction Datasets

arXiv.org Artificial Intelligence

Deep learning (DL) has achieved remarkable successes in many disciplines such as computer vision and natural language processing due to the availability of ``big data''. However, such success cannot be easily replicated in many nuclear engineering problems because of the limited amount of training data, especially when the data comes from high-cost experiments. To overcome such a data scarcity issue, this paper explores the applications of deep generative models (DGMs) that have been widely used for image data generation to scientific data augmentation. DGMs, such as generative adversarial networks (GANs), normalizing flows (NFs), variational autoencoders (VAEs), and conditional VAEs (CVAEs), can be trained to learn the underlying probabilistic distribution of the training dataset. Once trained, they can be used to generate synthetic data that are similar to the training data and significantly expand the dataset size. By employing DGMs to augment TRACE simulated data of the steady-state void fractions based on the NUPEC Boiling Water Reactor Full-size Fine-mesh Bundle Test (BFBT) benchmark, this study demonstrates that VAEs, CVAEs, and GANs have comparable generative performance with similar errors in the synthetic data, with CVAEs achieving the smallest errors. The findings shows that DGMs have a great potential to augment scientific data in nuclear engineering, which proves effective for expanding the training dataset and enabling other DL models to be trained more accurately.


Modeling Random Networks with Heterogeneous Reciprocity

arXiv.org Artificial Intelligence

Reciprocity, or the tendency of individuals to mirror behavior, is a key measure that describes information exchange in a social network. Users in social networks tend to engage in different levels of reciprocal behavior. Differences in such behavior may indicate the existence of communities that reciprocate links at varying rates. In this paper, we develop methodology to model the diverse reciprocal behavior in growing social networks. In particular, we present a preferential attachment model with heterogeneous reciprocity that imitates the attraction users have for popular users, plus the heterogeneous nature by which they reciprocate links. We compare Bayesian and frequentist model fitting techniques for large networks, as well as computationally efficient variational alternatives. Cases where the number of communities are known and unknown are both considered. We apply the presented methods to the analysis of a Facebook wallpost network where users have non-uniform reciprocal behavior patterns. The fitted model captures the heavy-tailed nature of the empirical degree distributions in the Facebook data and identifies multiple groups of users that differ in their tendency to reply to and receive responses to wallposts.


Semi-Implicit Variational Inference via Score Matching

arXiv.org Artificial Intelligence

Semi-implicit variational inference (SIVI) greatly enriches the expressiveness of variational families by considering implicit variational distributions defined in a hierarchical manner. However, due to the intractable densities of variational distributions, current SIVI approaches often use surrogate evidence lower bounds (ELBOs) or employ expensive inner-loop MCMC runs for unbiased ELBOs for training. In this paper, we propose SIVI-SM, a new method for SIVI based on an alternative training objective via score matching. Leveraging the hierarchical structure of semi-implicit variational families, the score matching objective allows a minimax formulation where the intractable variational densities can be naturally handled with denoising score matching. We show that SIVI-SM closely matches the accuracy of MCMC and outperforms ELBO-based SIVI methods in a variety of Bayesian inference tasks.


Accelerated Bayesian imaging by relaxed proximal-point Langevin sampling

arXiv.org Machine Learning

This paper presents a new accelerated proximal Markov chain Monte Carlo methodology to perform Bayesian inference in imaging inverse problems with an underlying convex geometry. The proposed strategy takes the form of a stochastic relaxed proximal-point iteration that admits two complementary interpretations. For models that are smooth or regularised by Moreau-Yosida smoothing, the algorithm is equivalent to an implicit midpoint discretisation of an overdamped Langevin diffusion targeting the posterior distribution of interest. This discretisation is asymptotically unbiased for Gaussian targets and shown to converge in an accelerated manner for any target that is $\kappa$-strongly log-concave (i.e., requiring in the order of $\sqrt{\kappa}$ iterations to converge, similarly to accelerated optimisation schemes), comparing favorably to [M. Pereyra, L. Vargas Mieles, K.C. Zygalakis, SIAM J. Imaging Sciences, 13, 2 (2020), pp. 905-935] which is only provably accelerated for Gaussian targets and has bias. For models that are not smooth, the algorithm is equivalent to a Leimkuhler-Matthews discretisation of a Langevin diffusion targeting a Moreau-Yosida approximation of the posterior distribution of interest, and hence achieves a significantly lower bias than conventional unadjusted Langevin strategies based on the Euler-Maruyama discretisation. For targets that are $\kappa$-strongly log-concave, the provided non-asymptotic convergence analysis also identifies the optimal time step which maximizes the convergence speed. The proposed methodology is demonstrated through a range of experiments related to image deconvolution with Gaussian and Poisson noise, with assumption-driven and data-driven convex priors.


Hybrid Models for Mixed Variables in Bayesian Optimization

arXiv.org Artificial Intelligence

This paper presents a new type of hybrid models for Bayesian optimization (BO) adept at managing mixed variables, encompassing both quantitative (continuous and integer) and qualitative (categorical) types. Our proposed new hybrid models merge Monte Carlo Tree Search structure (MCTS) for categorical variables with Gaussian Processes (GP) for continuous ones. Addressing efficiency in searching phase, we juxtapose the original (frequentist) upper confidence bound tree search (UCTS) and the Bayesian Dirichlet search strategies, showcasing the tree architecture's integration into Bayesian optimization. Central to our innovation in surrogate modeling phase is online kernel selection for mixed-variable BO. Our innovations, including dynamic kernel selection, unique UCTS (hybridM) and Bayesian update strategies (hybridD), position our hybrid models as an advancement in mixed-variable surrogate models. Numerical experiments underscore the hybrid models' superiority, highlighting their potential in Bayesian optimization. Keywords: Gaussian processes, Monte Carlo tree search, categorical variables, online kernel selection. The discussion of different types of encodings can be found in Cerda et al. (2018). 1 Introduction Our motivating problem is to optimize a "black-box" function with "mixed" variables, lacking an analytic expression. "Mixed" signifies the function's input variables comprise both continuous (quantitative) and categorical (qualitative) variables, common in machine learning and scientific computing tasks like performance tuning of mathematical libraries and application codes at runtime and compile-time (Balaprakash et al., 2018). Bayesian optimization (BO) with Gaussian process (GP) surrogate models is a prevalent method for optimizing noisy, expensive black-box functions, primarily designed for continuous-variable functions (Shahriari et al., 2016; Sid-Lakhdar et al., 2020). Extending BO to mixed-variable functions presents theoretical and computational challenges due to variable type differences (Table 1). Continuous variables have uncountably many values with magnitudes and intrinsic ordering, allowing natural gradient definition. In contrast, categorical variables, having finitely many values without intrinsic ordering or magnitude, require encoding in the GP context, potentially inducing discontinuity and degrading GP performance (Luo et al., 2021). The empirical rule of thumb for handling an integer variable (Karlsson et al., 2020) is to treat it as a categorical variable if the number of integer values (i.e., number of categorical values) is small, or as a continuous variable with embedding (a.k.a.


Spectral information criterion for automatic elbow detection

arXiv.org Artificial Intelligence

We introduce a generalized information criterion that contains other well-known information criteria, such as Bayesian information Criterion (BIC) and Akaike information criterion (AIC), as special cases. Furthermore, the proposed spectral information criterion (SIC) is also more general than the other information criteria, e.g., since the knowledge of a likelihood function is not strictly required. SIC extracts geometric features of the error curve and, as a consequence, it can be considered an automatic elbow detector. SIC provides a subset of all possible models, with a cardinality that often is much smaller than the total number of possible models. The elements of this subset are elbows of the error curve. A practical rule for selecting a unique model within the sets of elbows is suggested as well. Theoretical invariance properties of SIC are analyzed. Moreover, we test SIC in ideal scenarios where provides always the optimal expected results. We also test SIC in several numerical experiments: some involving synthetic data, and two experiments involving real datasets. They are all real-world applications such as clustering, variable selection, or polynomial order selection, to name a few. The results show the benefits of the proposed scheme. Matlab code related to the experiments is also provided. Possible future research lines are finally discussed.


Modeling Edge Features with Deep Bayesian Graph Networks

arXiv.org Artificial Intelligence

We propose an extension of the Contextual Graph Markov Model, a deep and probabilistic machine learning model for graphs, to model the distribution of edge features. Our approach is architectural, as we introduce an additional Bayesian network mapping edge features into discrete states to be used by the original model. In doing so, we are also able to build richer graph representations even in the absence of edge features, which is confirmed by the performance improvements on standard graph classification benchmarks. Moreover, we successfully test our proposal in a graph regression scenario where edge features are of fundamental importance, and we show that the learned edge representation provides substantial performance improvements against the original model on three link prediction tasks. By keeping the computational complexity linear in the number of edges, the proposed model is amenable to large-scale graph processing.


A Fusion of Variational Distribution Priors and Saliency Map Replay for Continual 3D Reconstruction

arXiv.org Artificial Intelligence

Single-image 3D reconstruction is a research challenge focused on predicting 3D object shapes from single-view images. This task requires significant data acquisition to predict both visible and occluded portions of the shape. Furthermore, learning-based methods face the difficulty of creating a comprehensive training dataset for all possible classes. To this end, we propose a continual learning-based 3D reconstruction method where our goal is to design a model using Variational Priors that can still reconstruct the previously seen classes reasonably even after training on new classes. Variational Priors represent abstract shapes and combat forgetting, whereas saliency maps preserve object attributes with less memory usage. This is vital due to resource constraints in storing extensive training data. Additionally, we introduce saliency map-based experience replay to capture global and distinct object features. Thorough experiments show competitive results compared to established methods, both quantitatively and qualitatively.