AITopics

2605.22507

Country:

Europe (0.92)
North America > United States (0.67)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.66)

Tran, Hoang-Son, Gupta, Pranav, Bardenet, Rémi, Ghosh, Subhroshekhar

State-of-art minibatches via novel DPP kernels: discretization, wavelets, and rough objectives

arXiv.org Machine LearningMay-14-2026

Determinantal point processes (DPPs) have emerged as a kernelized alternative to vanilla independent sampling for generating efficient minibatches, coresets and other parsimonious representations of large-scale datasets. While theoretical foundations and promising empirical performance have been demonstrated, there are two challenges for current proposals for DPP-based coresets or minibatches. The first is the need for families of DPPs with certain key variance reduction properties, usually constructed in a continuous setting, of which there are few known examples. The second is the need for an ad-hoc construction of a discrete DPP defined on a given dataset, that inherits such variance reduction. In this work, we contribute to the programme of establishing DPPs as a subsampling toolbox for ML by advancing on these two fronts. First, we propose new DPPs on the Euclidean space based on wavelets, with provably better accuracy guarantees than the best known rates. Second, we introduce a general method to convert such continuous DPPs, which are more amenable to proving analytical statements, into discrete kernels, which are pertinent for subsampling tasks such as minibatch and coreset constructions. This conversion mechanism simultaneously preserves the desired variance decay and reveals a low-rank decomposition of the discrete kernel, which makes sampling the corresponding DPP computationally inexpensive. En route, we enlarge the class of ML tasks amenable to improvements via DPP-based minibatches and coresets to include objective functions with arbitrarily low regularity, and rate guarantees that explicitly adapt to this regularity.

artificial intelligence, dpp, machine learning, (19 more...)

2605.13127

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Dance, Hugh, Xi, Johnny, Orbanz, Peter, Bloem-Reddy, Benjamin

Debiased Counterfactual Generation via Flow Matching from Observations

arXiv.org Machine LearningMay-11-2026

Estimating counterfactual distributions under interventions is central to treatment risk assessment and counterfactual generation tasks. Existing approaches model the counterfactual distribution as a standalone generative target, without exploiting its relationship to the observational data. In this work, we show that under standard assumptions, observational and counterfactual outcome distributions are tightly linked: they have identical support and tail behavior, remain statistically close under weak confounding, and share any features of high-dimensional outcomes which are invariant to confounders. These properties motivate learning counterfactual distributions not from scratch, but via a deconfounding flow from the observational distribution. We formulate this problem via flow-matching and derive a semiparametrically efficient estimator based on a novel efficient influence function correction. We subsequently extend our estimator to target minimal-energy flows in high-dimensions, which we show can be especially simple targets between observational and counterfactual distributions. In experiments, deconfounding flows outperform existing debiased counterfactual distribution estimators, while also mitigating known failure modes of flow-based methods.

artificial intelligence, estimator, machine learning, (18 more...)

2605.07665

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.46)
Information Technology (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Halgryn, Leon, Langer, Sophie, Meylahn, Janusz M., Hahn, E. Moritz

Beyond the Independence Assumption: Finite-Sample Guarantees for Deep Q-Learning under $τ$-Mixing

arXiv.org Machine LearningMay-8-2026

Finite-sample analyses of deep Q-learning typically treat replayed data as independent, even though it is sampled from temporally dependent state-action trajectories. We study the Deep Q-networks (DQN) algorithm under explicit dependence by modelling the minibatches used for updating the network as $τ$-mixing. We show that this assumption holds under certain dependence conditions on the underlying trajectories and the mechanism used to sample minibatches. Building on this observation, we extend statistical analyses of DQN with fully connected ReLU architectures to dependent data. We formulate each update as a nonparametric regression problem with $τ$-mixing observations and derive finite-sample risk bounds under this dependence structure. Our results show that temporal dependence leads to a degradation in the statistical rate by inducing an additional dimensionality penalty in the rate exponent, reflecting the reduced effective sample size of $τ$-mixing data. Moreover, we derive the sample complexity of DQN under $tau$-mixing from these risk bounds. Finally, we empirically demonstrate on standard Gymnasium environments that the independence assumption is systematically violated and that replay sampling yields approximately exponentially decaying correlations, supporting our theoretical framework.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

2605.06373

Genre: Research Report > New Finding (0.86)

Industry: Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Neural Information Processing SystemsApr-30-2026, 08:25:47 GMT

f74054328beeb0c21a9b8e99da557f5a-Supplemental-Conference.pdf

artificial intelligence, machine learning, natural language, (17 more...)

Country: Europe > United Kingdom > England (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.95)
Questionnaire & Opinion Survey (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Neural Information Processing SystemsApr-25-2026, 02:08:05 GMT

238c98450b1d9e8055f94d22f303bb57-Supplemental-Conference.pdf

artificial intelligence, experiment, machine learning, (17 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.47)

Tim Salimans, Durk P. Kingma

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

Neural Information Processing SystemsApr-22-2026, 12:13:32 GMT

By reparameterizing the weights in this way we improve the conditioning of the optimization problem and we speed up convergence of stochastic gradient descent. Our reparameterization is inspired by batch normalization but does not introduce any dependencies between the examples in a minibatch. This means that our method can also be applied successfully to recurrent models such as LSTMs and to noise-sensitive applications such as deep reinforcement learning or generative models, for which batch normalization is less well suited. Although our method is much simpler, it still provides much of the speed-up of full batch normalization. In addition, the computational overhead of our method is lower, permitting more optimization steps to be taken in the same amount of time. We demonstrate the usefulness of our method on applications in supervised image recognition, generative modelling, and deep reinforcement learning.

artificial intelligence, machine learning, normalization, (17 more...)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningMar-30-2026

Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory

Kim, Juno, Nichani, Eshaan, Wu, Denny, Bietti, Alberto, Lee, Jason D.

Spectral optimizers such as Muon have recently shown strong empirical performance in large-scale language model training, but the source and extent of their advantage remain poorly understood. We study this question through the linear associative memory problem, a tractable model for factual recall in transformer-based models. In particular, we go beyond orthogonal embeddings and consider Gaussian inputs and outputs, which allows the number of stored associations to greatly exceed the embedding dimension. Our main result sharply characterizes the recovery rates of one step of Muon and SGD on the logistic regression loss under a power law frequency distribution. We show that the storage capacity of Muon significantly exceeds that of SGD, and moreover Muon saturates at a larger critical batch size. We further analyze the multi-step dynamics under a thresholded gradient approximation and show that Muon achieves a substantially faster initial recovery rate than SGD, while both methods eventually converge to the information-theoretic limit at comparable speeds. Experiments on synthetic tasks validate the predicted scaling laws. Our analysis provides a quantitative understanding of the signal amplification of Muon and lays the groundwork for establishing scaling laws across more practical language modeling tasks and optimizers.

logd, machine learning, natural language, (21 more...)

2603.26554

Country:

Europe > France (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > District of Columbia > Washington (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Neural Information Processing SystemsMar-20-2026, 00:17:32 GMT

A Bayesian Approach to Data Point Selection

Data point selection (DPS) is becoming a critical topic in deep learning due to the ease of acquiring uncurated training data compared to the difficulty of obtaining curated or processed data. Existing approaches to DPS are predominantly based on a bi-level optimisation (BLO) formulation, which is demanding in terms of memory and computation, and exhibits some theoretical defects regarding minibatches.Thus, we propose a novel Bayesian approach to DPS. We view the DPS problem as posterior inference in a novel Bayesian model where the posterior distributions of the instance-wise weights and the main neural network parameters are inferred under a reasonable prior and likelihood model.We employ stochastic gradient Langevin MCMC sampling to learn the main network and instance-wise weights jointly, ensuring convergence even with minibatches. Our update equation is comparable to the widely used SGD and much more efficient than existing BLO-based methods. Through controlled experiments in both the vision and language domains, we present the proof-of-concept. Additionally, we demonstrate that our method scales effectively to large language models and facilitates automated per-task optimization for instruction fine-tuning datasets.

Hyeonseob Nam, Hyo-Eun Kim

Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks

Neural Information Processing SystemsMar-14-2026, 07:51:45 GMT

Similarly,thetexture ofacloth mayconfuse classifying clothing categories (shirts vs. skirts), but it is crucial for classifying fashion attributes (spottedvs.

artificial intelligence, machine learning, normalization, (20 more...)

Country:

Asia > South Korea > Seoul > Seoul (0.05)
North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)