AITopics

Over the past few years, several approaches utilizing score-based diffusion have been proposed to sample from probability distributions, that is without having access to exact samples and relying solely on evaluations of unnormalized densities. In practice, the performance of these methods heavily depends on key hyperparameters that require ground truth samples to be accurately tuned. Our work aims to highlight and address this fundamental issue, focusing in particular on multimodal distributions, which pose significant challenges for existing sampling methods. Building on existing approaches, we introduce Learned Reference-based Diffusion Sampler (LRDS), a methodology specifically designed to leverage prior knowledge on the location of the target modes in order to bypass the obstacle of hyperparameter tuning. LRDS proceeds in two steps by (i) learning a reference diffusion model on samples located in high-density space regions and tailored for multimodality, and (ii) using this reference model to foster the training of a diffusion-based sampler. We experimentally demonstrate that LRDS best exploits prior knowledge on the target distribution compared to competing algorithms on a variety of challenging distributions. We consider the problem of sampling from a probability density known up to a normalizing constant. In particular, we are interested in sampling from multimodal distributions, i.e., distributions whose density admits multiple local maxima, called modes. Finding the modes of such distributions is a notoriously hard problem, yet, maybe surprisingly, even if the location of the modes is known, sampling π remains a very challenging problem (Noé et al., 2019; Pompe et al., 2020; Grenioux et al., 2023). In this work, we aim to address this specific issue and will assume that we have access to the location of the modes as prior information on π. However, we do not assume to have access a priori to ground truth samples from π. Annealed MCMC. Markov Chain Monte Carlo (MCMC) samplers are among the most popular approaches for sampling. In particular, gradient-based methods based on discretizations of Langevin or Hamiltonian dynamics (Roberts & Tweedie, 1996; Neal, 2012; Hoffman & Gelman, 2014) are guaranteed to be efficient for high-dimensional target distributions that are log-concave or satisfy or functional inequalities (Dalalyan, 2017; Durmus & Moulines, 2017).

artificial intelligence, machine learning, target distribution, (21 more...)

2410.19449

Country:

Asia > Middle East > Jordan (0.04)
Europe > France (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

arXiv.org Artificial IntelligenceOct-25-2024

Robust Time Series Causal Discovery for Agent-Based Model Validation

Yu, Gene, Guo, Ce, Luk, Wayne

Agent-Based Model (ABM) validation is crucial as it helps ensuring the reliability of simulations, and causal discovery has become a powerful tool in this context. However, current causal discovery methods often face accuracy and robustness challenges when applied to complex and noisy time series data, which is typical in ABM scenarios. This study addresses these issues by proposing a Robust Cross-Validation (RCV) approach to enhance causal structure learning for ABM validation. We develop RCV-VarLiNGAM and RCV-PCMCI, novel extensions of two prominent causal discovery algorithms. These aim to reduce the impact of noise better and give more reliable causal relation results, even with high-dimensional, time-dependent data. The proposed approach is then integrated into an enhanced ABM validation framework, which is designed to handle diverse data and model structures. The approach is evaluated using synthetic datasets and a complex simulated fMRI dataset. The results demonstrate greater reliability in causal structure identification. The study examines how various characteristics of datasets affect the performance of established causal discovery methods. These characteristics include linearity, noise distribution, stationarity, and causal structure density. This analysis is then extended to the RCV method to see how it compares in these different situations. This examination helps confirm whether the results are consistent with existing literature and also reveals the strengths and weaknesses of the novel approaches. By tackling key methodological challenges, the study aims to enhance ABM validation with a more resilient valuation framework presented. These improvements increase the reliability of model-driven decision making processes in complex systems analysis.

artificial intelligence, causal discovery method, machine learning, (13 more...)

2410.19412

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Banking & Finance (1.00)
Health & Medicine > Health Care Technology (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Bhuckory, Radhika, Krishnamachari, Bhaskar

The Signaler-Responder Game: Learning to Communicate using Thompson Sampling

arXiv.org Artificial IntelligenceOct-25-2024

We are interested in studying how heterogeneous agents can learn to communicate and cooperate with each other without being explicitly pre-programmed to do so. Motivated by this goal, we present and analyze a distributed solution to a two-player signaler-responder game which is defined as follows. The signaler agent has a random, exogenous need and can choose from four different strategies: never signal, always signal, signal when need, and signal when no need. The responder agent can choose to either ignore or respond to the signal. We define a reward to both agents when they cooperate to satisfy the signaler's need, and costs associated with communication, response and unmet needs. We identify pure Nash equilibria of the game and the conditions under which they occur. As a solution for this game, we propose two new distributed Bayesian learning algorithms, one for each agent, based on the classic Thompson Sampling policy for multi-armed bandits. These algorithms allow both agents to update beliefs about both the exogenous need and the behavior of the other agent and optimize their own expected reward. We show that by using these policies, the agents are able to intelligently adapt their strategies over multiple iterations to attain efficient, reward-maximizing equilibria under different settings, communicating and cooperating when it is rewarding to do so, and not communicating or cooperating when it is too expensive.

artificial intelligence, bayesian inference, machine learning, (17 more...)

2410.19962

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.69)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Baptista, Ricardo, Brennan, Michael, Marzouk, Youssef

Dimension reduction via score ratio matching

Gradient-based dimension reduction decreases the cost of Bayesian inference and probabilistic modeling by identifying maximally informative (and informed) low-dimensional projections of the data and parameters, allowing high-dimensional problems to be reformulated as cheaper low-dimensional problems. A broad family of such techniques identify these projections and provide error bounds on the resulting posterior approximations, via eigendecompositions of certain diagnostic matrices. Yet these matrices require gradients or even Hessians of the log-likelihood, excluding the purely data-driven setting and many problems of simulation-based inference. We propose a framework, derived from score-matching, to extend gradient-based dimension reduction to problems where gradients are unavailable. Specifically, we formulate an objective function to directly learn the score ratio function needed to compute the diagnostic matrices, propose a tailored parameterization for the score ratio network, and introduce regularization methods that capitalize on the hypothesized low-dimensional structure. We also introduce a novel algorithm to iteratively identify the low-dimensional reduced basis vectors more accurately with limited data based on eigenvalue deflation methods. We show that our approach outperforms standard score-matching for problems with low-dimensional structure, and demonstrate its effectiveness for PDE-constrained Bayesian inverse problems and conditional generative modeling.

artificial intelligence, bayesian inference, machine learning, (15 more...)

2410.1999

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Los Angeles County > Pasadena (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry:

Government > Regional Government > North America Government > United States Government (0.68)
Energy > Power Industry (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)

Harvey, Ethan, Petrov, Mikhail, Hughes, Michael C.

Learning the Regularization Strength for Deep Fine-Tuning via a Data-Emphasized Variational Objective

A number of popular transfer learning methods rely on grid search to select regularization hyperparameters that control over-fitting. This grid search requirement has several key disadvantages: the search is computationally expensive, requires carving out a validation set that reduces the size of available data for model training, and requires practitioners to specify candidate values. In this paper, we propose an alternative to grid search: directly learning regularization hyperparameters on the full training set via model selection techniques based on the evidence lower bound ("ELBo") objective from variational methods. For deep neural networks with millions of parameters, we specifically recommend a modified ELBo that upweights the influence of the data likelihood relative to the prior while remaining a valid bound on the evidence for Bayesian model selection. Our proposed technique overcomes all three disadvantages of grid search. We demonstrate effectiveness on image classification tasks on several datasets, yielding heldout accuracy comparable to existing approaches with far less compute time.

artificial intelligence, bayesian inference, machine learning, (16 more...)

2410.19675

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Artificial IntelligenceOct-25-2024

FeBiM: Efficient and Compact Bayesian Inference Engine Empowered with Ferroelectric In-Memory Computing

Li, Chao, Xu, Zhicheng, Wen, Bo, Mao, Ruibin, Li, Can, Kämpfe, Thomas, Ni, Kai, Yin, Xunzhao

In scenarios with limited training data or where explainability is crucial, conventional neural network-based machine learning models often face challenges. In contrast, Bayesian inference-based algorithms excel in providing interpretable predictions and reliable uncertainty estimation in these scenarios. While many state-of-the-art in-memory computing (IMC) architectures leverage emerging non-volatile memory (NVM) technologies to offer unparalleled computing capacity and energy efficiency for neural network workloads, their application in Bayesian inference is limited. This is because the core operations in Bayesian inference differ significantly from the multiplication-accumulation (MAC) operations common in neural networks, rendering them generally unsuitable for direct implementation in most existing IMC designs. In this paper, we propose FeBiM, an efficient and compact Bayesian inference engine powered by multi-bit ferroelectric field-effect transistor (FeFET)-based IMC. FeBiM effectively encodes the trained probabilities of a Bayesian inference model within a compact FeFET-based crossbar. It maps quantized logarithmic probabilities to discrete FeFET states. As a result, the accumulated outputs of the crossbar naturally represent the posterior probabilities, i.e., the Bayesian inference model's output given a set of observations. This approach enables efficient in-memory Bayesian inference without the need for additional calculation circuitry. As the first FeFET-based in-memory Bayesian inference engine, FeBiM achieves an impressive storage density of 26.32 Mb/mm$^{2}$ and a computing efficiency of 581.40 TOPS/W in a representative Bayesian classification task. These results demonstrate 10.7$\times$/43.4$\times$ improvement in compactness/efficiency compared to the state-of-the-art hardware implementation of Bayesian inference.

artificial intelligence, bayesian inference, machine learning, (16 more...)

doi: 10.1145/3649329.3656538

2410.19356

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > China > Zhejiang Province (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.70)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Alrawajfeh, Talal, Jälkö, Joonas, Honkela, Antti

Noise-Aware Differentially Private Variational Inference

Differential privacy (DP) provides robust privacy guarantees for statistical inference, but this can lead to unreliable results and biases in downstream applications. While several noise-aware approaches have been proposed which integrate DP perturbation into the inference, they are limited to specific types of simple probabilistic models. In this work, we propose a novel method for noise-aware approximate Bayesian inference based on stochastic gradient variational inference which can also be applied to high-dimensional and non-conjugate models. We also propose a more accurate evaluation method for noise-aware posteriors. Empirically, our inference method has similar performance to existing methods in the domain where they are applicable. Outside this domain, we obtain accurate coverages on high-dimensional Bayesian linear regression and well-calibrated predictive probabilities on Bayesian logistic regression with the UCI Adult dataset.

artificial intelligence, bayesian inference, machine learning, (15 more...)

2410.19371

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)

Chen, Tianyu, Bansal, Vansh, Scott, James G.

Conditional diffusions for neural posterior estimation

arXiv.org Machine LearningOct-24-2024

Neural posterior estimation (NPE), a simulation-based computational approach for Bayesian inference, has shown great success in situations where posteriors are intractable or likelihood functions are treated as "black boxes." Existing NPE methods typically rely on normalizing flows, which transform a base distributions into a complex posterior by composing many simple, invertible transformations. But flow-based models, while state of the art for NPE, are known to suffer from several limitations, including training instability and sharp trade-offs between representational power and computational cost. In this work, we demonstrate the effectiveness of conditional diffusions as an alternative to normalizing flows for NPE. Conditional diffusions address many of the challenges faced by flow-based methods. Our results show that, across a highly varied suite of benchmarking problems for NPE architectures, diffusions offer improved stability, superior accuracy, and faster training times, even with simpler, shallower models. These gains persist across a variety of different encoder or "summary network" architectures, as well as in situations where no summary network is required.

artificial intelligence, bayesian inference, machine learning, (17 more...)

2410.19105

Country:

North America > United States > Minnesota (0.04)
North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Nadew, Yididiya Y., Fan, Xuhui, Quinn, Christopher J.

Learning Coupled Subspaces for Multi-Condition Spike Data

arXiv.org Artificial IntelligenceOct-24-2024

In neuroscience, researchers typically conduct experiments under multiple conditions to acquire neural responses in the form of high-dimensional spike train datasets. Analysing high-dimensional spike data is a challenging statistical problem. To this end, Gaussian process factor analysis (GPFA), a popular class of latent variable models has been proposed. GPFA extracts smooth, low-dimensional latent trajectories underlying high-dimensional spike train datasets. However, such analyses are often done separately for each experimental condition, contrary to the nature of neural datasets, which contain recordings under multiple experimental conditions. Exploiting the parametric nature of these conditions, we propose a multi-condition GPFA model and inference procedure to learn the underlying latent structure in the corresponding datasets in sample-efficient manner. In particular, we propose a non-parametric Bayesian approach to learn a smooth tuning function over the experiment condition space. Our approach not only boosts model accuracy and is faster, but also improves model interpretability compared to approaches that separately fit models for each experimental condition.

artificial intelligence, exp, machine learning, (14 more...)

2410.19153

Country:

North America > United States > Iowa > Story County > Ames (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Alsafadi, Farah, Yaseen, Mahmoud, Wu, Xu

An Investigation on Machine Learning Predictive Accuracy Improvement and Uncertainty Reduction using VAE-based Data Augmentation

arXiv.org Artificial IntelligenceOct-24-2024

However, a unique challenge in nuclear engineering is data scarcity because experimentation on nuclear systems is usually more expensive and time-consuming than most other disciplines. Large amounts of data may be available for certain parts such as pipes, pumps and turbines, etc., due to large network of sensors, but not for many others, such as critical heat flux in thermal-hydraulics experiments, advanced materials qualification data like molten salts and multi-principal element alloys, etc. Particularly concerning is the lack of data for advanced reactor design and safety analysis, raising challenges for utilizing ML in licensing analyses of advanced nuclear reactors. In these cases, we need to move beyond "throw more data and re-train" at the problem, which is the common solution in areas such as computer vision and natural language processing that have access to "big data". One potential way to address the data scarcity issue is data augmentation using deep generative learning. Deep generative learning is an unsupervised ML technique that aims at discovering and learning the regularities or patterns in existing data using deep generative models (DGMs), in order to generate new samples that plausibly could have been drawn from the real dataset. DGMs are typically neural networks (NNs) trained to learn or approximate the underlying distribution of the training data. This enables them to generate synthetic samples that closely match the distribution of the original training data. By employing DGMs for data augmentation, one can significantly expand the training dataset for ML models to achieve better performance in other tasks, such as data-driven predictive ML models. Data augmentation with DGMs is still a relatively new research area in nuclear engineering, but has been studied for a few years in computer vision and natural language processing for datasets involving images, audios, videos, spoken words, etc.

artificial intelligence, machine learning, prediction, (16 more...)

2410.19063

Country:

North America > United States > North Carolina > Wake County > Raleigh (0.04)
North America > United States > District of Columbia > Washington (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Energy > Power Industry > Utilities > Nuclear (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)