AITopics

2404.18154

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

arXiv.org Artificial IntelligenceApr-27-2024

Implicit Generative Prior for Bayesian Neural Networks

Liu, Yijia, Wang, Xiao

Predictive uncertainty quantification is crucial for reliable decision-making in various applied domains. Bayesian neural networks offer a powerful framework for this task. However, defining meaningful priors and ensuring computational efficiency remain significant challenges, especially for complex real-world applications. This paper addresses these challenges by proposing a novel neural adaptive empirical Bayes (NA-EB) framework. NA-EB leverages a class of implicit generative priors derived from low-dimensional distributions. This allows for efficient handling of complex data structures and effective capture of underlying relationships in real-world datasets. The proposed NA-EB framework combines variational inference with a gradient ascent algorithm. This enables simultaneous hyperparameter selection and approximation of the posterior distribution, leading to improved computational efficiency. We establish the theoretical foundation of the framework through posterior and classification consistency. We demonstrate the practical applications of our framework through extensive evaluations on a variety of tasks, including the two-spiral problem, regression, 10 UCI datasets, and image classification tasks on both MNIST and CIFAR-10 datasets. The results of our experiments highlight the superiority of our proposed framework over existing methods, such as sparse variational Bayesian and generative models, in terms of prediction accuracy and uncertainty quantification.

dataset, na-eb, neural network, (16 more...)

2404.18008

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.81)

Industry: Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Artificial IntelligenceApr-26-2024

Learning World Models With Hierarchical Temporal Abstractions: A Probabilistic Perspective

Shaj, Vaisakh

Machines that can replicate human intelligence with type 2 reasoning capabilities should be able to reason at multiple levels of spatio-temporal abstractions and scales using internal world models. Devising formalisms to develop such internal world models, which accurately reflect the causal hierarchies inherent in the dynamics of the real world, is a critical research challenge in the domains of artificial intelligence and machine learning. This thesis identifies several limitations with the prevalent use of state space models (SSMs) as internal world models and propose two new probabilistic formalisms namely Hidden-Parameter SSMs and Multi-Time Scale SSMs to address these drawbacks. The structure of graphical models in both formalisms facilitates scalable exact probabilistic inference using belief propagation, as well as end-to-end learning via backpropagation through time. This approach permits the development of scalable, adaptive hierarchical world models capable of representing nonstationary dynamics across multiple temporal abstractions and scales. Moreover, these probabilistic formalisms integrate the concept of uncertainty in world states, thus improving the system's capacity to emulate the stochastic nature of the real world and quantify the confidence in its predictions. The thesis also discuss how these formalisms are in line with related neuroscience literature on Bayesian brain hypothesis and predicitive processing. Our experiments on various real and simulated robots demonstrate that our formalisms can match and in many cases exceed the performance of contemporary transformer variants in making long-range future predictions. We conclude the thesis by reflecting on the limitations of our current models and suggesting directions for future research.

arrow denote deterministic transformation, corresponding time update equation, hidden parameter state space model, (14 more...)

2404.16078

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Education (1.00)
Transportation (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
(2 more...)

Feng, Yunzhen, Rudner, Tim G. J., Tsilivis, Nikolaos, Kempe, Julia

Attacking Bayes: On the Adversarial Robustness of Bayesian Neural Networks

arXiv.org Machine LearningApr-26-2024

Adversarial examples have been shown to cause neural networks to fail on a wide range of vision and language tasks, but recent work has claimed that Bayesian neural networks (bnns) are inherently robust to adversarial perturbations. In this work, we examine this claim. To study the adversarial robustness of bnns, we investigate whether it is possible to successfully break state-of-the-art bnn inference methods and prediction pipelines using even relatively unsophisticated attacks for three tasks: (1) label prediction under the posterior predictive mean, (2) adversarial example detection with Bayesian predictive uncertainty, and (3) semantic shift detection. We find that bnns trained with state-of-the-art approximate inference methods, and even bnns trained with Hamiltonian Monte Carlo, are highly susceptible to adversarial attacks. We also identify various conceptual and experimental errors in previous works that claimed inherent adversarial robustness of bnns and conclusively demonstrate that bnns and uncertainty-aware Bayesian prediction pipelines are not inherently robust against adversarial attacks.

neural network, robustness, selective accuracy, (13 more...)

2404.1964

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > Middle East > Jordan (0.04)
(14 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Chen, Yujie, Bhadra, Anindya, Chakraborty, Antik

Likelihood Based Inference in Fully and Partially Observed Exponential Family Graphical Models with Intractable Normalizing Constants

arXiv.org Machine LearningApr-26-2024

Probabilistic graphical models that encode an underlying Markov random field are fundamental building blocks of generative modeling to learn latent representations in modern multivariate data sets with complex dependency structures. Among these, the exponential family graphical models are especially popular, given their fairly well-understood statistical properties and computational scalability to high-dimensional data based on pseudo-likelihood methods. These models have been successfully applied in many fields, such as the Ising model in statistical physics and count graphical models in genomics. Another strand of models allows some nodes to be latent, so as to allow the marginal distribution of the observable nodes to depart from exponential family to capture more complex dependence. These approaches form the basis of generative models in artificial intelligence, such as the Boltzmann machines and their restricted versions. A fundamental barrier to likelihood-based (i.e., both maximum likelihood and fully Bayesian) inference in both fully and partially observed cases is the intractability of the likelihood. The usual workaround is via adopting pseudo-likelihood based approaches, following the pioneering work of Besag (1974). The goal of this paper is to demonstrate that full likelihood based analysis of these models is feasible in a computationally efficient manner. The chief innovation lies in using a technique of Geyer (1991) to estimate the intractable normalizing constant, as well as its gradient, for intractable graphical models. Extensive numerical results, supporting theory and comparisons with pseudo-likelihood based approaches demonstrate the applicability of the proposed method.

intractable normalizing constant, likelihood, observed exponential family graphical model, (1 more...)

2404.17763

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Systems & Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.53)

Dupuis, Benjamin, Viallard, Paul, Deligiannidis, George, Simsekli, Umut

Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets

arXiv.org Machine LearningApr-26-2024

We propose data-dependent uniform generalization bounds by approaching the problem from a PAC-Bayesian perspective. We first apply the PAC-Bayesian framework on `random sets' in a rigorous way, where the training algorithm is assumed to output a data-dependent hypothesis set after observing the training data. This approach allows us to prove data-dependent bounds, which can be applicable in numerous contexts. To highlight the power of our approach, we consider two main applications. First, we propose a PAC-Bayesian formulation of the recently developed fractal-dimension-based generalization bounds. The derived results are shown to be tighter and they unify the existing results around one simple proof technique. Second, we prove uniform bounds over the trajectories of continuous Langevin dynamics and stochastic gradient Langevin dynamics. These results provide novel information about the generalization properties of noisy algorithms.

assumption, generalization, hypothesis, (13 more...)

2404.17442

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.27)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Schwendinger, Benjamin, Schwendinger, Florian, Vana-Gür, Laura

Automated Model Selection for Generalized Linear Models

arXiv.org Machine LearningApr-25-2024

In this paper, we show how mixed-integer conic optimization can be used to combine feature subset selection with holistic generalized linear models to fully automate the model selection process. Concretely, we directly optimize for the Akaike and Bayesian information criteria while imposing constraints designed to deal with multicollinearity in the feature selection task. Specifically, we propose a novel pairwise correlation constraint that combines the sign coherence constraint with ideas from classical statistical models like Ridge regression and the OSCAR model.

constraint, regression, selection, (14 more...)

2404.1656

Country:

Europe > Austria > Vienna (0.04)
North America > United States > Kansas > Riley County > Manhattan (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Wang, Chenyang, Yang, Yun

Estimating the Number of Components in Finite Mixture Models via Variational Approximation

arXiv.org Machine LearningApr-25-2024

This work introduces a new method for selecting the number of components in finite mixture models (FMMs) using variational Bayes, inspired by the large-sample properties of the Evidence Lower Bound (ELBO) derived from mean-field (MF) variational approximation. Specifically, we establish matching upper and lower bounds for the ELBO without assuming conjugate priors, suggesting the consistency of model selection for FMMs based on maximizing the ELBO. As a by-product of our proof, we demonstrate that the MF approximation inherits the stable behavior (benefited from model singularity) of the posterior distribution, which tends to eliminate the extra components under model misspecification where the number of mixture components is over-specified. This stable behavior also leads to the $n^{-1/2}$ convergence rate for parameter estimation, up to a logarithmic factor, under this model overspecification. Empirical experiments are conducted to validate our theoretical findings and compare with other state-of-the-art methods for selecting the number of components in FMMs.

approximation, elbo, mixture model, (16 more...)

2404.16746

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Modeling & Simulation (0.93)

arXiv.org Artificial IntelligenceApr-24-2024

FedSI: Federated Subnetwork Inference for Efficient Uncertainty Quantification

Chen, Hui, Liu, Hengyu, Wu, Zhangkai, Fan, Xuhui, Cao, Longbing

While deep neural networks (DNNs) based personalized federated learning (PFL) is demanding for addressing data heterogeneity and shows promising performance, existing methods for federated learning (FL) suffer from efficient systematic uncertainty quantification. The Bayesian DNNs-based PFL is usually questioned of either over-simplified model structures or high computational and memory costs. In this paper, we introduce FedSI, a novel Bayesian DNNs-based subnetwork inference PFL framework. FedSI is simple and scalable by leveraging Bayesian methods to incorporate systematic uncertainties effectively. It implements a client-specific subnetwork inference mechanism, selects network parameters with large variance to be inferred through posterior distributions, and fixes the rest as deterministic ones. FedSI achieves fast and scalable inference while preserving the systematic uncertainties to the fullest extent. Extensive experiments on three different benchmark datasets demonstrate that FedSI outperforms existing Bayesian and non-Bayesian FL baselines in heterogeneous FL scenarios.

dataset, posterior distribution, representation layer, (16 more...)

2404.15657

Country:

Europe > Denmark > North Jutland > Aalborg (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (1.00)

Industry:

Education (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Rawlekar, Samyak, Bhatnagar, Shubhang, Srinivasulu, Vishnuvardhan Pogunulu, Ahuja, Narendra

Improving Multi-label Recognition using Class Co-Occurrence Probabilities

arXiv.org Artificial IntelligenceApr-24-2024

Multi-label Recognition (MLR) involves the identification of multiple objects within an image. To address the additional complexity of this problem, recent works have leveraged information from vision-language models (VLMs) trained on large text-images datasets for the task. These methods learn an independent classifier for each object (class), overlooking correlations in their occurrences. Such co-occurrences can be captured from the training data as conditional probabilities between a pair of classes. We propose a framework to extend the independent classifiers by incorporating the co-occurrence information for object pairs to improve the performance of independent classifiers. We use a Graph Convolutional Network (GCN) to enforce the conditional probabilities between classes, by refining the initial estimates derived from image and text sources obtained using VLMs. We validate our method on four MLR datasets, where our approach outperforms all state-of-the-art methods.

conditional probability, dataset, recognition, (11 more...)

2404.16193

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > Illinois > Champaign County > Champaign (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine (0.68)
Food & Agriculture (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.58)