AITopics

2607.01487

Genre: Research Report (0.50)

Industry:

Law > Statutes (0.95)
Government (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Bay, Yong Yi, Yearick, Kathleen A.

When More Sampling Hurts: The Modal Ceiling and Correlation Ceiling of Test-Time Scaling

People overthink; language models over-sample, and the extra effort can talk both into a worse answer. Reasoning systems answer a hard question by sampling it many times (test-time scaling), and the more they draw, the more often a correct answer turns up somewhere, so coverage, the fraction of problems with at least one correct try, climbs and appears to be progress. But a deployed system must return one answer, and choosing it, not knowing which try is right, is selection; selection is capped, and past a point extra samples only make the model surer of a confident mistake, even as every draw adds cost. The gap between climbing coverage and stalled selection, the identifiability gap, is the answer a model can produce but not pick. So the real question is not whether to sample but how far, and the answer is: not far. For picking an answer, the vote has already settled within a few dozen draws, the modal ceiling; for scoring a benchmark, sooner still, the correlation ceiling. Beyond that, extra draws cost compute and add nothing, and can even make the answer worse. This paper turns the cutoff into a single number, the effective number of samples, that any sampling run already reveals. The bottleneck is recognizing a right answer, not generating one.

correlation, large language model, machine learning, (21 more...)

2606.28661

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Multi-Source Transfer Learning of Sparse Single-Index Models

Tian, Ye

Transfer learning leverages knowledge from related source domains to improve learning in a target domain. Recent theoretical advances cover a broad range of regression settings within (generalized) linear models. Despite their diversity, these methods share two common constraints: they assume a known link function or linear structure and require direct access to raw source data. To move beyond these constraints, we propose a source-data-free transfer learning framework based on the single-index model (SIM). Instead of requiring raw source data, our method transfers only summary statistics derived from a generalized Stein's lemma in a one-time communication. This design preserves privacy and avoids side effects caused by dissimilarities of unknown nonlinear link functions across domains. To capture flexible, unknown nonlinearity, we employ a multilayer perceptron guided by the pre-estimated index from the transferred statistics, which significantly mitigates overfitting. Extensive experiments on synthetic data and a real-world application demonstrate consistent improvements over existing (generalized) linear model-based approaches. The proposed framework thus offers a practical, privacy-preserving, and nonlinear-adaptive solution for transfer learning.

artificial intelligence, estimator, machine learning, (19 more...)

2606.29658

Genre: Research Report (0.64)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Hegde, Disha, Cockayne, Jon, Oates, Chris. J.

Extrapolating from Regularised Solutions for Solving Ill-Conditioned Linear Systems in Machine Learning

Rapid prototyping of algorithms is a critical step in modern machine learning. Most algorithms exploit linear algebra, creating a need for lightweight numerical routines which -- while potentially sub-optimal for the task at hand -- can be rapidly implemented. For the numerical solution of ill-conditioned linear systems of equations, the standard solution for prototyping is Tikhonov-regularised inversion using a nugget. However, selection of the size of nugget is often difficult, and the use of data-adaptive procedures precludes automatic differentiation, introducing instabilities into end-to-end training. Further, while data-adaptive procedures perform multiple linear solves to select the size of nugget, only the result of one such solve is returned, which we argue is wasteful. This paper aims to circumvent the above difficulties, presenting autonugget; a Python package for automatic and stable numerical solution of linear systems suitable for rapid prototyping, and fully compatible with automatic differentiation using JAX. autonugget combines multiple linear solves using Richardson extrapolation to determine the solution of the ill-conditioned system, improving in accuracy over approximations based on a single nugget.

artificial intelligence, learning research, machine learning, (14 more...)

2606.30328

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)

Schwank, Richard, Drton, Mathias

Non-parametric recovery of causal diffusion mechanisms from steady-state observations

We consider sparse multivariate stochastic systems that evolve in continuous time according to a causal mechanism and present methodology to recover the system's time-infinitesimal transition mechanism from mere cross-sectional data. This observational paradigm is motivated by applications such as gene expression analysis, where destructive experimental techniques may only allow recording data once over a cell's lifetime. Precisely, we assume the system follows a time-homogeneous diffusion process that has reached an equilibrium distribution at observation time. Further, we assume the causal mechanism is fully described by the diffusion drift, is acyclic, and its causal structure graph is known. In this setting, we prove that the full causal mechanism, i.e., the drift function, can be non-parametrically identified under a weak non-explosion criterion. We derive a non-parametric kernel estimator for this challenging inverse problem and prove its consistency. Moreover, we propose a cross-validation scheme for hyperparameter tuning, illustrate the behavior of our estimator in simulations, and we discuss connections with irreversible generative diffusion models and low-frequency sampled data.

artificial intelligence, equation, machine learning, (18 more...)

2606.30467

Country: North America > United States (0.28)

Genre: Research Report (0.63)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Neural Information Processing SystemsJun-23-2026, 07:57:56 GMT

Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning

Membership inference attacks (MIAs) are used to test practical privacy of machine learning models. MIAs complement formal guarantees from differential privacy (DP) under a more realistic adversary model. We analyze MIA vulnerability of fine-tuned neural networks both empirically and theoretically, the latter using a simplified model of fine-tuning. We show that the vulnerability of non-DP models when measured as the attacker advantage at a fixed false positive rate reduces according to a simple power law as the number of examples per class increases. A similar power-law applies even for the most vulnerable points, but the dataset size needed for adequate protection of the most vulnerable points is very large.

artificial intelligence, machine learning, vulnerability, (17 more...)

Country: Europe (0.67)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsJun-23-2026, 04:35:46 GMT

DSAS: AUniversal Plug-and-Play Framework for Attention Optimization in Multi-Document Question Answering

While large language models (LLMs) show considerable promise across various fields, they have notable limitations in handling multi-document question answering (Multi-doc QA) tasks. The first challenge is long-range dependency modeling, where LLMs struggle to focus on key information in long texts, which weakens important semantic connections. Second, most LLMs suffer from the "lost-in-themiddle" issue, where they have difficulty processing information in the middle of long inputs. Current solutions either truncate global dependencies or demand costly finetuning, ultimately lacking a universal and simple solution for these challenges. To resolve these limitations, we propose Dual-Stage Adaptive Sharpening (DSAS) containing two modules.

large language model, machine learning, natural language, (19 more...)

Country:

Europe (1.00)
North America > United States (0.67)
North America > Canada (0.46)
Asia > China (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Media > Film (0.67)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsJun-23-2026, 03:59:58 GMT

On the Relation between Rectified Flows and Optimal Transport

This paper investigates the connections between rectified flows, flow matching, and optimal transport. Flow matching is a recent approach to learning generative models by estimating velocity fields that guide transformations from a source to a target distribution. Rectified flow matching aims to straighten the learned transport paths, yielding more direct flows between distributions. Our first contribution is a set of invariance properties of rectified flows and explicit velocity fields. In addition, we also provide explicit constructions and analysis in the Gaussian (not necessarily independent) and Gaussian mixture settings and study the relation to optimal transport. Our second contribution addresses recent claims suggesting that rectified flows, when constrained such that the learned velocity field is a gradient, can yield (asymptotically) solutions to optimal transport problems. We study the existence of solutions for this problem and demonstrate that they only relate to optimal transport under assumptions that are significantly stronger than those previously acknowledged. In particular, we present several counterexamples that invalidate earlier equivalence results in the literature, and we argue that enforcing a gradient constraint on rectified flows is, in general, not a reliable method for computing optimal transport maps.

artificial intelligence, machine learning, natural language, (19 more...)

Country: Europe (0.28)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.67)
(2 more...)

Neural Information Processing SystemsJun-23-2026, 03:42:16 GMT

Robustness in Both Domains: CLIP Needs a Robust Text Encoder

Adversarial input attacks can cause a significant shift of CLIP embeddings. This can affect the downstream robustness of models incorporating CLIP in the pipeline, such as text-to-image generative models or large vision language models. While some efforts have been done towards making the CLIP image encoders robust, the robustness of text encoders remains unexplored. In this work, we cover this gap in the literature. We propose LEAF: an efficient adversarial finetuning method for the text domain, with the ability to scale to large CLIP models. Our models significantly improve the zero-shot adversarial accuracy in the text domain, while maintaining the vision performance provided by robust image encoders. When combined with text-to-image diffusion models, we can improve the generation quality under adversarial noise. In multimodal retrieval tasks, LEAF improves the recall under adversarial noise over standard CLIP models. Finally, we show that robust text encoders facilitate better reconstruction of input text from its embedding via direct optimization.

large language model, machine learning, natural language, (14 more...)

Country:

Europe > Switzerland (0.28)
North America > Canada (0.28)

Industry: Media > News (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Berthier, Raphaël, Pillaud-Vivien, Loucas

Incremental Learning in Mirror Flows

arXiv.org Machine LearningJun-23-2026

Neural networks trained with gradient descent often learn solutions of increasing complexity: the model first captures simple structure, then progressively incorporates finer details [AJB+17, KKN+19, ZSL25]. This incremental learning phenomenon, often visible as plateaus in the training loss separated by rapid transitions, has attracted significant theoretical attention. The most detailed analyses of incremental learning have been carried out for diagonal linear networks, including precise descriptions of transition times and plateau levels [Ber23, PF23]. This level of detail is possible because the training dynamics of these networks reduce to a mirror flow [WGL+20]. Mirror flows themselves feature incremental learning when initialized near the boundary of the domain of the mirror potential. This paper gives a rigorous description of this phenomenon for a broad class of mirror flows, thereby generalizing the previous analyses of diagonal linear networks.

artificial intelligence, machine learning, mirror flow, (14 more...)

2606.23198

Country: Europe > France (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)