AITopics

2411.00465

Country:

Asia > China (0.04)
Europe > Portugal (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.48)
Overview > Innovation (0.48)

Industry: Information Technology > Security & Privacy (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Wyrwal, Kacper, Krause, Andreas, Borovitskiy, Viacheslav

Residual Deep Gaussian Processes on Manifolds

We propose practical deep Gaussian process models on Riemannian manifolds, similar in spirit to residual neural networks. With manifold-to-manifold hidden layers and an arbitrary last layer, they can model manifold-and scalar-valued functions, as well as vector fields. We target data inherently supported on manifolds, which is too complex for shallow Gaussian processes thereon. For example, while the latter perform well on high-altitude wind data, they struggle with the more intricate, nonstationary patterns at low altitudes. Our models significantly improve performance in these settings, enhancing prediction quality and uncertainty calibration, and remain robust to overfitting, reverting to shallow models when additional complexity is unneeded. We further showcase our models on Bayesian optimisation problems on manifolds, using stylised examples motivated by robotics, and obtain substantial improvements in later stages of the optimisation process. Finally, we show our models to have potential for speeding up inference for nonmanifold data, when, and if, it can be mapped to a proxy manifold well enough. Gaussian processes (GPs) are a widely adopted model class for learning functions within the Bayesian framework (Rasmussen and Williams, 2006). They offer accurate uncertainty estimates and perform well even when data is scarce. Consequently, GPs have found success in decisionmaking tasks, where well-calibrated uncertainty is key, including Bayesian optimisation (Snoek et al., 2012), active (Krause et al., 2008) and reinforcement (Kamthe and Deisenroth, 2018) learning. In recent years, substantial work went into developing the analogs of practical GP models on various non-Euclidean domains (Borovitskiy et al., 2021; 2023; 2020; Fichera et al., 2023).

artificial intelligence, bayesian inference, machine learning, (18 more...)

2411.00161

Country: Europe (0.28)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)

Inclusive KL Minimization: A Wasserstein-Fisher-Rao Gradient Flow Perspective

Zhu, Jia-Jie

Otto's (2001) Wasserstein gradient flow of the exclusive KL divergence functional provides a powerful and mathematically principled perspective for analyzing learning and inference algorithms. In contrast, algorithms for the inclusive KL inference, i.e., minimizing $ \mathrm{KL}(\pi \| \mu) $ with respect to $ \mu $ for some target $ \pi $, are rarely analyzed using tools from mathematical analysis. This paper shows that a general-purpose approximate inclusive KL inference paradigm can be constructed using the theory of gradient flows derived from PDE analysis. We uncover that several existing learning algorithms can be viewed as particular realizations of the inclusive KL inference paradigm. For example, existing sampling algorithms such as Arbel et al. (2019) and Korba et al. (2021) can be viewed in a unified manner as inclusive-KL inference with approximate gradient estimators. Finally, we provide the theoretical foundation for the Wasserstein-Fisher-Rao gradient flows for minimizing the inclusive KL divergence.

artificial intelligence, bayesian inference, machine learning, (17 more...)

2411.00214

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Kaplanoglou, Pantelis I., Diamantaras, Konstantinos

Learning local discrete features in explainable-by-design convolutional neural networks

arXiv.org Artificial IntelligenceOct-31-2024

Our proposed framework attempts to break the trade-off between performance and explainability by introducing an explainable-by-design convolutional neural network (CNN) based on the lateral inhibition mechanism. The ExplaiNet model consists of the predictor, that is a high-accuracy CNN with residual or dense skip connections, and the explainer probabilistic graph that expresses the spatial interactions of the network neurons. The value on each graph node is a local discrete feature (LDF) vector, a patch descriptor that represents the indices of antagonistic neurons ordered by the strength of their activations, which are learned with gradient descent. Using LDFs as sequences we can increase the conciseness of explanations by repurposing EXTREME, an EM-based sequence motif discovery method that is typically used in molecular biology. Having a discrete feature motif matrix for each one of intermediate image representations, instead of a continuous activation tensor, allows us to leverage the inherent explainability of Bayesian networks. By collecting observations and directly calculating probabilities, we can explain causal relationships between motifs of adjacent levels and attribute the model's output to global motifs. Moreover, experiments on various tiny image benchmark datasets confirm that our predictor ensures the same level of performance as the baseline architecture for a given count of parameters and/or layers. Our novel method shows promise to exceed this performance while providing an additional stream of explanations. In the solved MNIST classification task, it reaches a comparable to the state-of-the-art performance for single models, using standard training setup and 0.75 million parameters.

artificial intelligence, experiment, machine learning, (17 more...)

2411.00139

Country:

Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Greece (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Efficient Model Compression for Bayesian Neural Networks

Saha, Diptarka, Liu, Zihe, Liang, Feng

Model Compression has drawn much attention within the deep learning community recently. Compressing a dense neural network offers many advantages including lower computation cost, deployability to devices of limited storage and memories, and resistance to adversarial attacks. This may be achieved via weight pruning or fully discarding certain input features. Here we demonstrate a novel strategy to emulate principles of Bayesian model selection in a deep learning setup. Given a fully connected Bayesian neural network with spike-and-slab priors trained via a variational algorithm, we obtain the posterior inclusion probability for every node that typically gets lost. We employ these probabilities for pruning and feature selection on a host of simulated and real-world benchmark data and find evidence of better generalizability of the pruned model in all our experiments.

artificial intelligence, bayesian inference, machine learning, (16 more...)

2411.00273

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(3 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine (0.46)
Government > Military (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Zhu, Yilun, Zhang, Jianxin, Gangrade, Aditya, Scott, Clayton

Label Noise: Ignorance Is Bliss

We establish a new theoretical framework for learning under multi-class, instance-dependent label noise. This framework casts learning with label noise as a form of domain adaptation, in particular, domain adaptation under posterior drift. We introduce the concept of \emph{relative signal strength} (RSS), a pointwise measure that quantifies the transferability from noisy to clean posterior. Using RSS, we establish nearly matching upper and lower bounds on the excess risk. Our theoretical findings support the simple \emph{Noise Ignorant Empirical Risk Minimization (NI-ERM)} principle, which minimizes empirical risk while ignoring label noise. Finally, we translate this theoretical insight into practice: by using NI-ERM to fit a linear classifier on top of a self-supervised feature extractor, we achieve state-of-the-art performance on the CIFAR-N data challenge.

classifier, label noise, signal strength, (14 more...)

2411.00079

Country:

North America > United States > Michigan (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Government > Regional Government (0.45)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

arXiv.org Artificial IntelligenceOct-31-2024

Bayesian-guided Label Mapping for Visual Reprogramming

Cai, Chengyi, Ye, Zesheng, Feng, Lei, Qi, Jianzhong, Liu, Feng

Visual reprogramming (VR) leverages the intrinsic capabilities of pretrained vision models by adapting their input or output interfaces to solve downstream tasks whose labels (i.e., downstream labels) might be totally different from the labels associated with the pretrained models (i.e., pretrained labels). When adapting the output interface, label mapping methods transform the pretrained labels to downstream labels by establishing a gradient-free one-to-one correspondence between the two sets of labels. However, in this paper, we reveal that one-to-one mappings may overlook the complex relationship between pretrained and downstream labels. Motivated by this observation, we propose a Bayesian-guided Label Mapping (BLM) method. BLM constructs an iteratively-updated probabilistic label mapping matrix, with each element quantifying a pairwise relationship between pretrained and downstream labels. The assignment of values to the constructed matrix is guided by Bayesian conditional probability, considering the joint distribution of the downstream labels and the labels predicted by the pretrained model on downstream samples. Experiments conducted on both pretrained vision models (e.g., ResNeXt) and vision-language models (e.g., CLIP) demonstrate the superior performance of BLM over existing label mapping methods. The success of BLM also offers a probabilistic lens through which to understand and analyze the effectiveness of VR. Our code is available at https://github.com/tmlr-group/BayesianLM.

downstream task, pretrained label, pretrained model, (14 more...)

2410.24018

Country:

Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry:

Transportation (0.93)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
(3 more...)

Polanska, Alicja, Wouters, Thibeau, Pang, Peter T. H., Wong, Kaze K. W., McEwen, Jason D.

Accelerated Bayesian parameter estimation and model selection for gravitational waves with normalizing flows

arXiv.org Artificial IntelligenceOct-31-2024

We present an accelerated pipeline, based on high-performance computing techniques and normalizing flows, for joint Bayesian parameter estimation and model selection and demonstrate its efficiency in gravitational wave astrophysics. We integrate the Jim inference toolkit, a normalizing flow-enhanced Markov chain Monte Carlo (MCMC) sampler, with the learned harmonic mean estimator. Our Bayesian evidence estimates run on $1$ GPU are consistent with traditional nested sampling techniques run on $16$ CPU cores, while reducing the computation time by factors of $5\times$ and $15\times$ for $4$-dimensional and $11$-dimensional gravitational wave inference problems, respectively. Our code is available in well-tested and thoroughly documented open-source packages, ensuring accessibility and reproducibility for the wider research community.

bayesian evidence, inference, parameter estimation, (9 more...)

2410.21076

Country:

North America > United States > Massachusetts > Middlesex County > Newton (0.04)
Europe > Netherlands (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Cai, Diana, Modi, Chirag, Margossian, Charles C., Gower, Robert M., Blei, David M., Saul, Lawrence K.

EigenVI: score-based variational inference with orthogonal function expansions

We develop EigenVI, an eigenvalue-based approach for black-box variational inference (BBVI). EigenVI constructs its variational approximations from orthogonal function expansions. For distributions over $\mathbb{R}^D$, the lowest order term in these expansions provides a Gaussian variational approximation, while higher-order terms provide a systematic way to model non-Gaussianity. These approximations are flexible enough to model complex distributions (multimodal, asymmetric), but they are simple enough that one can calculate their low-order moments and draw samples from them. EigenVI can also model other types of random variables (e.g., nonnegative, bounded) by constructing variational approximations from different families of orthogonal functions. Within these families, EigenVI computes the variational approximation that best matches the score function of the target distribution by minimizing a stochastic estimate of the Fisher divergence. Notably, this optimization reduces to solving a minimum eigenvalue problem, so that EigenVI effectively sidesteps the iterative gradient-based optimizations that are required for many other BBVI algorithms. (Gradient-based methods can be sensitive to learning rates, termination criteria, and other tunable hyperparameters.) We use EigenVI to approximate a variety of target distributions, including a benchmark suite of Bayesian models from posteriordb. On these distributions, we find that EigenVI is more accurate than existing methods for Gaussian BBVI.

basis function, eigenvi, expansion, (13 more...)

2410.24054

Country:

Asia > Middle East > Jordan (0.04)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

arXiv.org Artificial IntelligenceOct-30-2024

Full-waveform earthquake source inversion using simulation-based inference

Saoulis, A. A., Piras, D., Mancini, A. Spurio, Joachimi, B., Ferreira, A. M. G.

This paper presents a novel framework for full-waveform seismic source inversion using simulation-based inference (SBI). Traditional probabilistic approaches often rely on simplifying assumptions about data errors, which we show can lead to inaccurate uncertainty quantification. SBI addresses this limitation by building an empirical probabilistic model of the data errors using machine learning models, known as neural density estimators, which can then be integrated into the Bayesian inference framework. We apply the SBI framework to point-source moment tensor inversions as well as joint moment tensor and time-location inversions. We construct a range of synthetic examples to explore the quality of the SBI solutions, as well as to compare the SBI results with standard Gaussian likelihood-based Bayesian inversions. We then demonstrate that under real seismic noise, common Gaussian likelihood assumptions for treating full-waveform data yield overconfident posterior distributions that underestimate the moment tensor component uncertainties by up to a factor of 3. We contrast this with SBI, which produces well-calibrated posteriors that generally agree with the true seismic source parameters, and offers an order-of-magnitude reduction in the number of simulations required to perform inference compared to standard Monte Carlo techniques. Finally, we apply our methodology to a pair of moderate magnitude earthquakes in the North Atlantic. We utilise seismic waveforms recorded by the recent UPFLOW ocean bottom seismometer array as well as by regional land stations in the Azores, comparing full moment tensor and source-time location posteriors between SBI and a Gaussian likelihood approach. We find that our adaptation of SBI can be directly applied to real earthquake sources to efficiently produce high quality posterior distributions that significantly improve upon Gaussian likelihood approaches.

artificial intelligence, data quality, machine learning, (17 more...)

2410.23238

Country:

Europe > United Kingdom (0.28)
Europe > Switzerland > Geneva > Geneva (0.14)
North America > United States > California (0.14)
(2 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Energy > Oil & Gas > Upstream (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Scientific Computing (1.00)
Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
(3 more...)