AITopics

2510.22835

Country: North America > United States (0.93)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.88)
Information Technology > Networks (0.61)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(2 more...)

Jain, Anchit, Bates, Stephen

Frequentist Validity of Epistemic Uncertainty Estimators

arXiv.org Machine LearningOct-28-2025

Decomposing prediction uncertainty into its aleatoric (irreducible) and epistemic (reducible) components is critical for the development and deployment of machine learning systems. A popular, principled measure for epistemic uncertainty is the mutual information between the response variable and model parameters. However, evaluating this measure requires access to the posterior distribution of the model parameters, which is challenging to compute. In view of this, we introduce a frequentist measure of epistemic uncertainty based on the bootstrap. Our main theoretical contribution is a novel asymptotic expansion that reveals that our proposed (frequentist) measure and the (Bayesian) mutual information are asymptotically equivalent. This provides frequentist interpretations to mutual information and new computational strategies for approximating it. Moreover, we link our proposed approach to the widely-used heuristic approach of deep ensembles, giving added perspective on their practical success.

artificial intelligence, bayesian inference, machine learning, (17 more...)

2510.22063

Country: North America > United States (0.48)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

arXiv.org Machine LearningOct-28-2025

Differentiable Constraint-Based Causal Discovery

Zhou, Jincheng, Wang, Mengbo, He, Anqi, Zhou, Yumeng, Olya, Hessam, Kocaoglu, Murat, Ribeiro, Bruno

Causal discovery from observational data is a fundamental task in artificial intelligence, with far-reaching implications for decision-making, predictions, and interventions. Despite significant advances, existing methods can be broadly categorized as constraint-based or score-based approaches. Constraint-based methods offer rigorous causal discovery but are often hindered by small sample sizes, while score-based methods provide flexible optimization but typically forgo explicit conditional independence testing. This work explores a third avenue: developing differentiable $d$-separation scores, obtained through a percolation theory using soft logic. This enables the implementation of a new type of causal discovery method: gradient-based optimization of conditional independence constraints. Empirical evaluations demonstrate the robust performance of our approach in low-sample regimes, surpassing traditional constraint-based and score-based baselines on a real-world dataset. Code and data of the proposed method are publicly available at https://github$.$com/PurdueMINDS/DAGPA.

artificial intelligence, bayesian inference, machine learning, (18 more...)

2510.22031

Country: North America > United States (0.92)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Boeken, Philip, Zoeter, Onno, Mooij, Joris M.

Conditional Forecasts and Proper Scoring Rules for Reliable and Accurate Performative Predictions

arXiv.org Machine LearningOct-27-2025

Performative predictions are forecasts which influence the outcomes they aim to predict, undermining the existence of correct forecasts and standard methods of elicitation and estimation. We show that conditioning forecasts on covariates that separate them from the outcome renders the target distribution forecast-invariant, guaranteeing well-posedness of the forecasting problem. However, even under this condition, classical proper scoring rules fail to elicit correct forecasts. We prove a general impossibility result and identify two solutions: (i) in decision-theoretic settings, elicitation of correct and incentive-compatible forecasts is possible if forecasts are separating; (ii) scoring with unbiased estimates of the divergence between the forecast and the induced distribution of the target variable yields correct forecasts. Applying these insights to parameter estimation, conditional forecasts and proper scoring rules enable performatively stable estimation of performatively correct parameters, resolving the issues raised by Perdomo et al. (2020). Our results expose fundamental limits of classical forecast evaluation and offer new tools for reliable and accurate forecasting in performative settings.

artificial intelligence, bayesian inference, machine learning, (19 more...)

2510.21335

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
(2 more...)

Genre: Research Report (0.83)

Industry: Leisure & Entertainment > Games (0.97)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Medvedeva, Arina, Balzani, Edoardo, Williams, Alex H, Keeley, Stephen L

Scalable inference of functional neural connectivity at submillisecond timescales

arXiv.org Machine LearningOct-27-2025

The Poisson Generalized Linear Model (GLM) is a foundational tool for analyzing neural spike train data. However, standard implementations rely on discretizing spike times into binned count data, limiting temporal resolution and scalability. Here, we develop Monte Carlo (MC) methods and polynomial approximations (PA) to the continuous-time analog of these models, and show them to be advantageous over their discrete-time counterparts. Further, we propose using a set of exponentially scaled Laguerre polynomials as an orthogonal temporal basis, which improves filter identification and yields closed-form integral solutions under the polynomial approximation. Applied to both synthetic and real spike-time data from rodent hippocampus, our methods demonstrate superior accuracy and scalability compared to traditional binned GLMs, enabling functional connectivity inference in large-scale neural recordings that are temporally precise on the order of synaptic dynamical timescales and in agreement with known anatomical properties of hippocampal subregions. We provide open-source implementations of both MC and PA estimators, optimized for GPU acceleration, to facilitate adoption in the neuroscience community.

approximation, artificial intelligence, machine learning, (16 more...)

2510.20966

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Large Language Bayes

Domke, Justin

Many domain experts do not have the time or expertise to write formal Bayesian models. This paper takes an informal problem description as input, and combines a large language model and a probabilistic programming language to define a joint distribution over formal models, latent variables, and data. A posterior over latent variables follows by conditioning on observed data and integrating over formal models. This presents a challenging inference problem. We suggest an inference recipe that amounts to generating many formal models from the large language model, performing approximate inference on each, and then doing a weighted average. This is justified and analyzed as a combination of self-normalized importance sampling, MCMC, and importance-weighted variational inference. Experimentally, this produces sensible predictions from only data and an informal problem description, without the need to specify a formal model.

large language model, machine learning, posterior, (21 more...)

2504.14025

Country: North America > United States (0.67)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
(2 more...)

LCDB 1.1: A Database Illustrating Learning Curves Are More Ill-Behaved Than Previously Thought

Yan, Cheng, Mohr, Felix, Viering, Tom

Sample-wise learning curves plot performance versus training set size. They are useful for studying scaling laws and speeding up hyperparameter tuning and model selection. Learning curves are often assumed to be well-behaved: monotone (i.e. improving with more data) and convex. By constructing the Learning Curves Database 1.1 (LCDB 1.1), a large-scale database with high-resolution learning curves including more modern learners (CatBoost, TabNet, RealMLP and TabPFN), we show that learning curves are less often well-behaved than previously thought. Using statistically rigorous methods, we observe significant ill-behavior in approximately 15% of the learning curves, almost twice as much as in previous estimates. We also identify which learners are to blame and show that specific learners are more ill-behaved than others. Additionally, we demonstrate that different feature scalings rarely resolve ill-behavior. We evaluate the impact of ill-behavior on downstream tasks, such as learning curve fitting and model selection, and find it poses significant challenges, underscoring the relevance and potential of LCDB 1.1 as a challenging benchmark for future research.

artificial intelligence, lcdb 1, machine learning, (17 more...)

2505.15657

Country:

Europe (0.46)
North America (0.28)
Asia (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.92)
Information Technology (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.92)
(2 more...)

Wang, Junda, Hu, Minghui, Li, Ning, Al-Ali, Abdulaziz, Suganthan, Ponnuthurai Nagaratnam

Randomized Neural Network with Adaptive Forward Regularization for Online Task-free Class Incremental Learning

Randomized Neural Network with Adaptive Forward Regularization for Online Task-free Class Incremental Learning Junda Wang, Minghui Hu, Ning Li, Abdulaziz Al-Ali, Ponnuthurai Nagarat-nam Suganthan To better acclimate OTCIL scenarios, forward knowledge is exploited to reduce regret and deliver efficient decision-making for ensemble Randomized NN learning in long task streams. This framework realizes one-pass incremental updates with less loss and superiority over ridge. Based on the framework, edR VFL-kF algorithm with adjustable forward regularization is derived, effectively avoiding previous replay and catastrophic forgetting. To overcome the intractable tuning and distribution drifting of -kF, we further propose edRVFL-kF-Bayes with ks synchronously self-adapted based on Bayesian learning in non-i.i.d OTCIL streams. Extensive experiments were conducted on image datasets and the results were analyzed from multiple views (including 6 metrics, dynamic behaviors, and ablation tests), revealing the outstanding performance of edRVFL-kF-Bayes and robustness even with a large PTM. Abstract Class incremental learning (CIL) requires an agent to learn distinct tasks consecutively with knowledge retention against forgetting. Problems impeding the practical applications of CIL methods are twofold: (1) non-i.i.d batch streams and no boundary prompts to update, known as the harsher online task-free CIL (OTCIL) scenario; (2) CIL methods suffer from memory loss in learning long task streams, as shown in Figure 1 (a). To achieve efficient decision-making and decrease cumulative regrets during the OTCIL process, a randomized neural network (Randomized NN) with forward regularization (-F) is proposed to resist forgetting and enhance learning performance. This work was supported by the National Natural Science Foundation of China under Grant 62273230 and 62203302, and the State Scholarship Fund of China Scholarship Council under Grant 202206230182. This paper was submitted to an Elsevier journal in Feb. 2025. Based on this framework, we derive the algorithm of the ensemble deep random vector functional link network (edR VFL) with adjustable forward regularization (-kF), where k mediates the intensity of the intervention. Moreover, to curb unstable penalties caused by non-i.i.d and mitigate intractable tuning of -kF in OTCIL, we improve it to the plug-and-play edR VFL-kF-Bayes, enabling all hard ks in multiple sub-learners to be self-adaptively determined based on Bayesian learning. Experiments were conducted on 2 image datasets including 6 metrics, dynamic performance, ablation tests, and compatibility, which distinctly validates the efficacy of our OTCIL frameworks with -kF-Bayes and -kF styles.

artificial intelligence, bayesian inference, machine learning, (19 more...)

2510.21367

Country: Asia > China (0.54)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.54)
Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Information Theoretic Learning for Diffusion Models with Warm Start

Shen, Yirong, Gan, Lu, Ling, Cong

Generative models that maximize model likelihood have gained traction in many practical settings. Among them, perturbation based approaches underpin many strong likelihood estimation models, yet they often face slow convergence and limited theoretical understanding. In this paper, we derive a tighter likelihood bound for noise driven models to improve both the accuracy and efficiency of maximum likelihood learning. Our key insight extends the classical KL divergence Fisher information relationship to arbitrary noise perturbations, going beyond the Gaussian assumption and enabling structured noise distributions. This formulation allows flexible use of randomized noise distributions that naturally account for sensor artifacts, quantization effects, and data distribution smoothing, while remaining compatible with standard diffusion training. Treating the diffusion process as a Gaussian channel, we further express the mismatched entropy between data and model, showing that the proposed objective upper bounds the negative log-likelihood (NLL). In experiments, our models achieve competitive NLL on CIFAR-10 and SOTA results on ImageNet across multiple resolutions, all without data augmentation, and the framework extends naturally to discrete data.

artificial intelligence, machine learning, natural language, (17 more...)

2510.20903

Country: North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology (0.92)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Knowledge Distillation of Uncertainty using Deep Latent Factor Model

Park, Sehyun, Lee, Jongjin, Shin, Yunseop, Ohn, Ilsang, Kim, Yongdai

Deep ensembles deliver state-of-the-art, reliable uncertainty quantification, but their heavy computational and memory requirements hinder their practical deployments to real applications such as on-device AI. Knowledge distillation compresses an ensemble into small student models, but existing techniques struggle to preserve uncertainty partly because reducing the size of DNNs typically results in variation reduction. To resolve this limitation, we introduce a new method of distribution distillation (i.e. compressing a teacher ensemble into a student distribution instead of a student ensemble) called Gaussian distillation, which estimates the distribution of a teacher ensemble through a special Gaussian process called the deep latent factor model (DLF) by treating each member of the teacher ensemble as a realization of a certain stochastic process. The mean and covariance functions in the DLF model are estimated stably by using the expectation-maximization (EM) algorithm. By using multiple benchmark datasets, we demonstrate that the proposed Gaussian distillation outperforms existing baselines. In addition, we illustrate that Gaussian distillation works well for fine-tuning of language models and distribution shift problems.

distillation, machine learning, natural language, (20 more...)

2510.1929

Genre: Research Report > Experimental Study (0.93)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)