AITopics

2504.17921

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Industry:

Government > Regional Government (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.92)
Information Technology > Sensing and Signal Processing > Image Processing (0.92)
(3 more...)

Yrjänäinen, Väinö, Boström, Isac, Magnusson, Måns, Jonasson, Johan

Posterior Sampling of Probabilistic Word Embeddings

arXiv.org Artificial IntelligenceAug-5-2025

Quantifying uncertainty in word embeddings is crucial for reliable inference from textual data. However, existing Bayesian methods such as Hamiltonian Monte Carlo (HMC) and mean-field variational inference (MFVI) are either computationally infeasible for large data or rely on restrictive assumptions. We propose a scalable Gibbs sampler using Polya-Gamma augmentation as well as Laplace approximation and compare them with MFVI and HMC for word embeddings. In addition, we address non-identifiability in word embeddings. Our Gibbs sampler and HMC correctly estimate uncertainties, while MFVI does not, and Laplace approximation only does so on large sample sizes, as expected. Applying the Gibbs sampler to the US Congress and the Movielens datasets, we demonstrate the feasibility on larger real data. Finally, as a result of having draws from the full posterior, we show that the posterior mean of word embeddings improves over maximum a posteriori (MAP) estimates in terms of hold-out likelihood, especially for smaller sampling sizes, further strengthening the need for posterior sampling of word embeddings.

artificial intelligence, machine learning, natural language, (18 more...)

2508.02337

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Das, Abhinav, Schlüter, Stephan

Regime-Aware Conditional Neural Processes with Multi-Criteria Decision Support for Operational Electricity Price Forecasting

The energy market has faced a significant structural change in the past decade. The global strife for decarbonization is encouraging the use of renewable energy sources, thus affecting the traditional supply-demand pattern, which were historically dominated by fossil fuels like coal, oil, and natural gas [18]. The growing integration of renewable energy sources into the power supply increases uncertainties in the electricity market due to intermittent nature of the sources such as wind or sunshine [57]. The volatility of the generation sources causes high price shocks and regime changes that is compromising to financial stability as well as investment strategies in the power market [58]. Particularly for countries such as Germany, where the larger percentage of electricity is produced by renewable energy sources [37], levels of sunlight and wind impact electricity generation and thus prices. This introduces, in addition to the physical problem of balancing the grid, non-stationarity to most price models, which further adds unreliability to the predictions. Accurate electricity price forecasting is crucial for efficient resource planning, financial risk management, and stabilization of the market, especially with increasing renewable energy penetration, which enables utilities, businesses, and governments to optimize planning and policy maximization while matching demand and supply. The building of an adequate prediction model, which is relatively straightforward and understandable but at the same time can reflect the market complexity and all influence factors engaged in it is not straightforward, and authors have utilized quite broadly three types of model for prediction: statistical/(probability-based) models [12], machine learning/deep learning models [42], and mixed models [30]. Precise forecasting allows the players in the market to make sound monetary policy.

artificial intelligence, machine learning, regime, (21 more...)

2508.0004

Country:

Europe (1.00)
North America > United States > New York (0.28)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry:

Energy > Power Industry (1.00)
Energy > Oil & Gas > Trading (0.67)
Energy > Renewable > Solar (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(4 more...)

Vishnu, Adit, Shastry, Abhay, Kashyap, Dhruva, Bhattacharyya, Chiranjib

DO-EM: Density Operator Expectation Maximization

Density operators, quantum generalizations of probability distributions, are gaining prominence in machine learning due to their foundational role in quantum computing. Generative modeling based on density operator models (\textbf{DOMs}) is an emerging field, but existing training algorithms -- such as those for the Quantum Boltzmann Machine -- do not scale to real-world data, such as the MNIST dataset. The Expectation-Maximization algorithm has played a fundamental role in enabling scalable training of probabilistic latent variable models on real-world datasets. \textit{In this paper, we develop an Expectation-Maximization framework to learn latent variable models defined through \textbf{DOMs} on classical hardware, with resources comparable to those used for probabilistic models, while scaling to real-world data.} However, designing such an algorithm is nontrivial due to the absence of a well-defined quantum analogue to conditional probability, which complicates the Expectation step. To overcome this, we reformulate the Expectation step as a quantum information projection (QIP) problem and show that the Petz Recovery Map provides a solution under sufficient conditions. Using this formulation, we introduce the Density Operator Expectation Maximization (DO-EM) algorithm -- an iterative Minorant-Maximization procedure that optimizes a quantum evidence lower bound. We show that the \textbf{DO-EM} algorithm ensures non-decreasing log-likelihood across iterations for a broad class of models. Finally, we present Quantum Interleaved Deep Boltzmann Machines (\textbf{QiDBMs}), a \textbf{DOM} that can be trained with the same resources as a DBM. When trained with \textbf{DO-EM} under Contrastive Divergence, a \textbf{QiDBM} outperforms larger classical DBMs in image generation on the MNIST dataset, achieving a 40--60\% reduction in the Fréchet Inception Distance.

artificial intelligence, bayesian inference, machine learning, (16 more...)

2507.22786

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.64)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Da Costa, Nathaël, Pförtner, Marvin, Cockayne, Jon

Constructive Disintegration and Conditional Modes

Conditioning, the central operation in Bayesian statistics, is formalised by the notion of disintegration of measures. However, due to the implicit nature of their definition, constructing disintegrations is often difficult. A folklore result in machine learning conflates the construction of a disintegration with the restriction of probability density functions onto the subset of events that are consistent with a given observation. We provide a comprehensive set of mathematical tools which can be used to construct disintegrations and apply these to find densities of disintegrations on differentiable manifolds. Using our results, we provide a disturbingly simple example in which the restricted density and the disintegration density drastically disagree. Motivated by applications in approximate Bayesian inference and Bayesian inverse problems, we further study the modes of disintegrations. We show that the recently introduced notion of a "conditional mode" does not coincide in general with the modes of the conditional measure obtained through disintegration, but rather the modes of the restricted measure. We also discuss the implications of the discrepancy between the two measures in practice, advocating for the utility of both approaches depending on the modelling context.

artificial intelligence, disintegration, machine learning, (16 more...)

2508.00617

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
North America > United States > New York (0.04)
North America > Panama (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Hundrieser, Shayan, Manole, Tudor, Litskevich, Danila, Munk, Axel

Local Poisson Deconvolution for Discrete Signals

We analyze the statistical problem of recovering an atomic signal, modeled as a discrete uniform distribution $μ$, from a binned Poisson convolution model. This question is motivated, among others, by super-resolution laser microscopy applications, where precise estimation of $μ$ provides insights into spatial formations of cellular protein assemblies. Our main results quantify the local minimax risk of estimating $μ$ for a broad class of smooth convolution kernels. This local perspective enables us to sharply quantify optimal estimation rates as a function of the clustering structure of the underlying signal. Moreover, our results are expressed under a multiscale loss function, which reveals that different parts of the underlying signal can be recovered at different rates depending on their local geometry. Overall, these results paint an optimistic perspective on the Poisson deconvolution problem, showing that accurate recovery is achievable under a much broader class of signals than suggested by existing global minimax analyses. Beyond Poisson deconvolution, our results also allow us to establish the local minimax rate of parameter estimation in Gaussian mixture models with uniform weights. We apply our methods to experimental super-resolution microscopy data to identify the location and configuration of individual DNA origamis. In addition, we complement our findings with numerical experiments on runtime and statistical recovery that showcase the practical performance of our estimators and their trade-offs.

artificial intelligence, estimator, machine learning, (20 more...)

2508.00824

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Ma, Yaxin, Colburn, Benjamin, Principe, Jose C.

A Simple and Effective Method for Uncertainty Quantification and OOD Detection

arXiv.org Artificial IntelligenceAug-4-2025

Bayesian neural networks and deep ensemble methods have been proposed for uncertainty quantification; however, they are computationally intensive and require large storage. By utilizing a single deterministic model, we can solve the above issue. We propose an effective method based on feature space density to quantify uncertainty for distributional shifts and out-of-distribution (OOD) detection. Specifically, we leverage the information potential field derived from kernel density estimation to approximate the feature space density of the training set. By comparing this density with the feature space representation of test samples, we can effectively determine whether a distributional shift has occurred. Experiments were conducted on a 2D synthetic dataset (Two Moons and Three Spirals) as well as an OOD detection task (CIFAR-10 vs. SVHN). The results demonstrate that our method outperforms baseline models.

artificial intelligence, bayesian inference, machine learning, (14 more...)

2508.00754

Country: North America > United States (0.15)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

arXiv.org Artificial IntelligenceAug-4-2025

CyGATE: Game-Theoretic Cyber Attack-Defense Engine for Patch Strategy Optimization

Jiang, Yuning, Oo, Nay, Meng, Qiaoran, Lin, Lu, Niyato, Dusit, Xiong, Zehui, Lim, Hoon Wei, Sikdar, Biplab

--Modern cyber attacks unfold through multiple stages, requiring defenders to dynamically prioritize mitigations under uncertainty. While game-theoretic models capture attacker-defender interactions, existing approaches often rely on static assumptions and lack integration with real-time threat intelligence, limiting their adaptability. This paper presents Cy-GATE, a game-theoretic framework modeling attacker-defender interactions, using large language models (LLMs) with retrieval-augmented generation (RAG) to enhance tactic selection and patch prioritization. Applied to a two-agent scenario, CyGATE frames cyber conflicts as a partially observable stochastic game (POSG) across Cyber Kill Chain stages. Both agents use belief states to navigate uncertainty, with the attacker adapting tactics and the defender re-prioritizing patches based on evolving risks and observed adversary behavior . The framework's flexible architecture enables extension to multi-agent scenarios involving coordinated attackers, collaborative defenders, or complex enterprise environments with multiple stakeholders. The evolving cybersecurity landscape presents increasingly sophisticated threats that necessitate adaptive, proactive defense strategies. Patch management, a cornerstone of cyber defense, requires intelligent prioritization of vulnerabilities under resource constraints such as maintenance windows and operational cost [1] [2] . However, traditional scoring systems like common vulnerability scoring system (CVSS) [3] fail to capture the evolving nature of cyber threats, where attackers adapt their strategies based on defender actions. Game theory provides a structured framework for modeling attacker-defender interactions [4], with chained or multistage games particularly suited to representing complex attack progressions along the Cyber Kill Chain (CKC) [5][6][7]. These models allow defenders to reason about long-term risks and preempt cascading compromises. Despite these advancements, existing models remain constrained by fixed strategies, static payoff structures, and minimal integration of threat intelligence, failing to dynamically prioritize vulnerabilities based on evolving exploitation trends [8]. Traditional game-theoretical approaches typically use predefined rules to analyze strategies, hence are limited in dynamic cyber environments where adversaries continuously adapt, operate under uncertainty, and employ unpredictable tactics [9].

large language model, machine learning, natural language, (22 more...)

2508.00478

Country: North America > United States (0.28)

Genre: Research Report (0.81)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.91)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(4 more...)

Wycoff, Nathan, Arab, Ali, Singh, Lisa O.

Formal Bayesian Transfer Learning via the Total Risk Prior

arXiv.org Machine LearningAug-1-2025

In analyses with severe data-limitations, augmenting the target dataset with information from ancillary datasets in the application domain, called source datasets, can lead to significantly improved statistical procedures. However, existing methods for this transfer learning struggle to deal with situations where the source datasets are also limited and not guaranteed to be well-aligned with the target dataset. A typical strategy is to use the empirical loss minimizer on the source data as a prior mean for the target parameters, which places the estimation of source parameters outside of the Bayesian formalism. Our key conceptual contribution is to use a risk minimizer conditional on source parameters instead. This allows us to construct a single joint prior distribution for all parameters from the source datasets as well as the target dataset. As a consequence, we benefit from full Bayesian uncertainty quantification and can perform model averaging via Gibbs sampling over indicator variables governing the inclusion of each source dataset. We show how a particular instantiation of our prior leads to a Bayesian Lasso in a transformed coordinate system and discuss computational techniques to scale our approach to moderately sized datasets. We also demonstrate that recently proposed minimax-frequentist transfer learning techniques may be viewed as an approximate Maximum a Posteriori approach to our model. Finally, we demonstrate superior predictive performance relative to the frequentist baseline on a genetics application, especially when the source data are limited.

artificial intelligence, bayesian inference, machine learning, (18 more...)

2507.23768

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.93)

arXiv.org Machine LearningAug-1-2025

Overcoming error-in-variable problem in data-driven model discovery by orthogonal distance regression

Fung, Lloyd

Despite the recent proliferation of machine learning methods like SINDy that promise automatic discovery of governing equations from time-series data, there remain significant challenges to discovering models from noisy datasets. One reason is that the linear regression underlying these methods assumes that all noise resides in the training target (the regressand), which is the time derivative, whereas the measurement noise is in the states (the regressors). Recent methods like modified-SINDy and DySMHO address this error-in-variable problem by leveraging information from the model's temporal evolution, but they are also imposing the equation as a hard constraint, which effectively assumes no error in the regressand. Without relaxation, this hard constraint prevents assimilation of data longer than Lyapunov time. Instead, the fulfilment of the model equation should be treated as a soft constraint to account for the small yet critical error introduced by numerical truncation. The uncertainties in both the regressor and the regressand invite the use of orthogonal distance regression (ODR). By incorporating ODR with the Bayesian framework for model selection, we introduce a novel method for model discovery, termed ODR-BINDy, and assess its performance against current SINDy variants using the Lorenz63, Rossler, and Van Der Pol systems as case studies. Our findings indicate that ODR-BINDy consistently outperforms all existing methods in recovering the correct model from sparse and noisy datasets. For instance, our ODR-BINDy method reliably recovers the Lorenz63 equation from data with noise contamination levels of up to 30%.

artificial intelligence, bayesian inference, machine learning, (19 more...)

2507.23426

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)