AITopics | arXiv.org Machine Learning

Collaborating Authors

arXiv.org Machine Learning

Missing-Data-Induced Phase Transitions in Spectral PLS for Multimodal Learning

Gjølbye, Anders, Kargaard, Ida, Kargaard, Emma, Hansen, Lars Kai

arXiv.org Machine LearningJan-30-2026

Partial Least Squares (PLS) learns shared structure from paired data via the top singular vectors of the empirical cross-covariance (PLS-SVD), but multimodal datasets often have missing entries in both views. We study PLS-SVD under independent entry-wise missing-completely-at-random masking in a proportional high-dimensional spiked model. After appropriate normalization, the masked cross-covariance behaves like a spiked rectangular random matrix whose effective signal strength is attenuated by $\sqrtρ$, where $ρ$ is the joint entry retention probability. As a result, PLS-SVD exhibits a sharp BBP-type phase transition: below a critical signal-to-noise threshold the leading singular vectors are asymptotically uninformative, while above it they achieve nontrivial alignment with the latent shared directions, with closed-form asymptotic overlap formulas. Simulations and semi-synthetic multimodal experiments corroborate the predicted phase diagram and recovery curves across aspect ratios, signal strengths, and missingness levels.

artificial intelligence, machine learning, missing-data-induced phase transition, (12 more...)

arXiv.org Machine Learning

2601.21294

Country:

Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Denmark > Capital Region > Kongens Lyngby (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science (0.84)

Add feedback

Supervised Guidance Training for Infinite-Dimensional Diffusion Models

Baker, Elizabeth L., Denker, Alexander, Frellsen, Jes

arXiv.org Machine LearningJan-30-2026

Score-based diffusion models have recently been extended to infinite-dimensional function spaces, with uses such as inverse problems arising from partial differential equations. In the Bayesian formulation of inverse problems, the aim is to sample from a posterior distribution over functions obtained by conditioning a prior on noisy observations. While diffusion models provide expressive priors in function space, the theory of conditioning them to sample from the posterior remains open. We address this, assuming that either the prior lies in the Cameron-Martin space, or is absolutely continuous with respect to a Gaussian measure. We prove that the models can be conditioned using an infinite-dimensional extension of Doob's $h$-transform, and that the conditional score decomposes into an unconditional score and a guidance term. As the guidance term is intractable, we propose a simulation-free score matching objective (called Supervised Guidance Training) enabling efficient and stable posterior sampling. We illustrate the theory with numerical examples on Bayesian inverse problems in function spaces. In summary, our work offers the first function-space method for fine-tuning trained diffusion models to accurately sample from a posterior.

artificial intelligence, diffusion model, machine learning, (18 more...)

arXiv.org Machine Learning

2601.20756

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Denmark (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Signal from Structure: Exploiting Submodular Upper Bounds in Generative Flow Networks

Larouche, Alexandre, Durand, Audrey

arXiv.org Machine LearningJan-30-2026

Generative Flow Networks (GFlowNets; GFNs) are a class of generative models that learn to sample compositional objects proportionally to their a priori unknown value, their reward. We focus on the case where the reward has a specified, actionable structure, namely that it is submodular. We show submodularity can be harnessed to retrieve upper bounds on the reward of compositional objects that have not yet been observed. We provide in-depth analyses of the probability of such bounds occurring, as well as how many unobserved compositional objects can be covered by a bound. Following the Optimism in the Face of Uncertainty principle, we then introduce SUBo-GFN, which uses the submodular upper bounds to train a GFN. We show that SUBo-GFN generates orders of magnitude more training data than classical GFNs for the same number of queries to the reward function. We demonstrate the effectiveness of SUBo-GFN in terms of distribution matching and high-quality candidate generation on synthetic and real-world submodular tasks.

machine learning, natural language, trajectory, (13 more...)

arXiv.org Machine Learning

2601.21061

Country:

North America > United States (0.04)
North America > Canada > Quebec (0.04)
Europe > Italy > Lazio > Rome (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.34)

Add feedback

On sample complexity for covariance estimation via the unadjusted Langevin algorithm

Nakakita, Shogo

arXiv.org Machine LearningJan-30-2026

We establish sample complexity guarantees for estimating the covariance matrix of strongly log-concave smooth distributions using the unadjusted Langevin algorithm (ULA). We quantitatively compare our complexity estimates on single-chain ULA with embarrassingly parallel ULA and derive that the sample complexity of the single-chain approach is smaller than that of embarrassingly parallel ULA by a logarithmic factor in the dimension and the reciprocal of the prescribed precision, with the difference arising from effective bias reduction through burn-in. The key technical contribution is a concentration bound for the sample covariance matrix around its expectation, derived via a log-Sobolev inequality for the joint distribution of ULA iterates.

artificial intelligence, inequality, machine learning, (16 more...)

arXiv.org Machine Learning

2601.21717

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

Provably Reliable Classifier Guidance through Cross-entropy Error Control

Sahu, Sharan, Banerjee, Arisina, Wu, Yuchen

arXiv.org Machine LearningJan-30-2026

Classifier-guided diffusion models generate conditional samples by augmenting the reverse-time score with the gradient of a learned classifier, yet it remains unclear whether standard classifier training procedures yield effective diffusion guidance. We address this gap by showing that, under mild smoothness assumptions on the classifiers, controlling the cross-entropy error at each diffusion step also controls the error of the resulting guidance vectors: classifiers achieving conditional KL divergence $\varepsilon^2$ from the ground-truth conditional label probabilities induce guidance vectors with mean squared error $\widetilde{O}(d \varepsilon )$. Our result yields an upper bound on the sampling error under classifier guidance and bears resemblance to a reverse log-Sobolev-type inequality. Moreover, we show that the classifier smoothness assumption is essential, by constructing simple counterexamples demonstrating that, without it, control of the guidance vector can fail for almost all distributions. To our knowledge, our work establishes the first quantitative link between classifier training and guidance alignment, yielding both a theoretical foundation for classifier guidance and principled guidelines for classifier selection.

artificial intelligence, bayesian inference, machine learning, (13 more...)

arXiv.org Machine Learning

2601.212

Country:

North America > United States > New York > Tompkins County > Ithaca (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
(2 more...)

Add feedback

Questioning the Coverage-Length Metric in Conformal Prediction: When Shorter Intervals Are Not Better

Min, Yizhou, Lu, Yizhou, Li, Lanqi, Zhang, Zhen, Teng, Jiaye

arXiv.org Machine LearningJan-30-2026

Conformal prediction (CP) has become a cornerstone of distribution-free uncertainty quantification, conventionally evaluated by its coverage and interval length. This work critically examines the sufficiency of these standard metrics. We demonstrate that the interval length might be deceptively improved through a counter-intuitive approach termed Prejudicial Trick (PT), while the coverage remains valid. Specifically, for any given test sample, PT probabilistically returns an interval, which is either null or constructed using an adjusted confidence level, thereby preserving marginal coverage. While PT potentially yields a deceptively lower interval length, it introduces practical vulnerabilities: the same input can yield completely different prediction intervals across repeated runs of the algorithm. We formally derive the conditions under which PT achieves these misleading improvements and provides extensive empirical evidence across various regression and classification tasks. Furthermore, we introduce a new metric interval stability which helps detect whether a new CP method implicitly improves the length based on such PT-like techniques.

data mining, machine learning, prediction, (20 more...)

arXiv.org Machine Learning

2601.21455

Country:

Asia > Singapore (0.04)
North America > United States > Tennessee (0.04)
Europe > France (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Add feedback

A Theory of Universal Agnostic Learning

Hanneke, Steve, Moran, Shay

arXiv.org Machine LearningJan-30-2026

We provide a complete theory of optimal universal rates for binary classification in the agnostic setting. This extends the realizable-case theory of Bousquet, Hanneke, Moran, van Handel, and Yehudayoff (2021) by removing the realizability assumption on the distribution. We identify a fundamental tetrachotomy of optimal rates: for every concept class, the optimal universal rate of convergence of the excess error rate is one of $e^{-n}$, $e^{-o(n)}$, $o(n^{-1/2})$, or arbitrarily slow. We further identify simple combinatorial structures which determine which of these categories any given concept class falls into.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

2601.20961

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York (0.04)
North America > United States > California (0.04)
(4 more...)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Factorizable joint shift revisited

Tasche, Dirk

arXiv.org Machine LearningJan-30-2026

Such failure can be caused by distribution shift (also known as dataset shift) between the training and test datasets. For this reason, distribution shift and domain adaptation (a notion comprising techniques for tackling distribution shift) has been a major research topic in machine learning for some time. This paper takes the perspective of Kouw and Loog (2021) and studies the case where feature observations from the test dataset are available for analysis but observations of labels are missing. Under these circumstances, without any assumptions on the nature of the distribution shift between the training and test datasets meaningful prediction of the labels in the test dataset or of their distribution is not feasible. See Kouw and Loog (2021) for a survey of approaches to domain adaptation and their related assumptions. Arguably, covariate shift (also known as population drift) and label shift (also known as prior probability shift or target shift) are the most popular specific distribution shift assumptions, both for their intuiveness as well as their computational manageability. However, exclusive covariate and label shift assumptions have been criticised for being insufficient for common domain adaptation tasks (e.g.

artificial intelligence, assumption 2, machine learning, (19 more...)

arXiv.org Machine Learning

2601.15036

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

High-dimensional learning dynamics of multi-pass Stochastic Gradient Descent in multi-index models

Fan, Zhou, Wang, Leda

arXiv.org Machine LearningJan-30-2026

We study the learning dynamics of a multi-pass, mini-batch Stochastic Gradient Descent (SGD) procedure for empirical risk minimization in high-dimensional multi-index models with isotropic random data. In an asymptotic regime where the sample size $n$ and data dimension $d$ increase proportionally, for any sub-linear batch size $κ\asymp n^α$ where $α\in [0,1)$, and for a commensurate ``critical'' scaling of the learning rate, we provide an asymptotically exact characterization of the coordinate-wise dynamics of SGD. This characterization takes the form of a system of dynamical mean-field equations, driven by a scalar Poisson jump process that represents the asymptotic limit of SGD sampling noise. We develop an analogous characterization of the Stochastic Modified Equation (SME) which provides a Gaussian diffusion approximation to SGD. Our analyses imply that the limiting dynamics for SGD are the same for any batch size scaling $α\in [0,1)$, and that under a commensurate scaling of the learning rate, dynamics of SGD, SME, and gradient flow are mutually distinct, with those of SGD and SME coinciding in the special case of a linear model. We recover a known dynamical mean-field characterization of gradient flow in a limit of small learning rate, and of one-pass/online SGD in a limit of increasing sample size $n/d \to \infty$.

artificial intelligence, high-dimensional learning dynamic, machine learning, (14 more...)

arXiv.org Machine Learning

2601.21093

Country:

North America > United States (0.14)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

Add feedback

Differential Dynamic Causal Nets: Model Construction, Identification and Group Comparisons

You, Kang, Green, Gary, Zhang, Jian

arXiv.org Machine LearningJan-30-2026

Pathophysiolpgical modelling of brain systems from microscale to macroscale remains difficult in group comparisons partly because of the infeasibility of modelling the interactions of thousands of neurons at the scales involved. Here, to address the challenge, we present a novel approach to construct differential causal networks directly from electroencephalogram (EEG) data. The proposed network is based on conditionally coupled neuronal circuits which describe the average behaviour of interacting neuron populations that contribute to observed EEG data. In the network, each node represents a parameterised local neural system while directed edges stand for node-wise connections with transmission parameters. The network is hierarchically structured in the sense that node and edge parameters are varying in subjects but follow a mixed-effects model. A novel evolutionary optimisation algorithm for parameter inference in the proposed method is developed using a loss function derived from Chen-Fliess expansions of stochastic differential equations. The method is demonstrated by application to the fitting of coupled Jansen-Rit local models. The performance of the proposed method is evaluated on both synthetic and real EEG data. In the real EEG data analysis, we track changes in the parameters that characterise dynamic causality within brains that demonstrate epileptic activity. We show evidence of network functional disruptions, due to imbalance of excitatory-inhibitory interneurons and altered epileptic brain connectivity, before and during seizure periods.

artificial intelligence, evolutionary algorithm, machine learning, (18 more...)

arXiv.org Machine Learning

2601.21478

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Oxfordshire (0.04)
Europe > United Kingdom > England > Kent > Canterbury (0.04)
Europe > France (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Therapeutic Area > Genetic Disease (0.94)
(2 more...)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.67)
(2 more...)

Add feedback