AITopics

2604.21595

Country: North America > United States (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningApr-15-2026

Understanding and Improving Continuous Adversarial Training for LLMs via In-context Learning Theory

Fu, Shaopeng, Wang, Di

Adversarial training (AT) is an effective defense for large language models (LLMs) against jailbreak attacks, but performing AT on LLMs is costly. To improve the efficiency of AT for LLMs, recent studies propose continuous AT (CAT) that searches for adversarial inputs within the continuous embedding space of LLMs during AT. While CAT has achieved empirical success, its underlying mechanism, i.e., why adversarial perturbations in the embedding space can help LLMs defend against jailbreak prompts synthesized in the input token space, remains unknown. This paper presents the first theoretical analysis of CAT on LLMs based on in-context learning (ICL) theory. For linear transformers trained with adversarial examples from the embedding space on in-context linear regression tasks, we prove a robust generalization bound that has a negative correlation with the perturbation radius in the embedding space. This clearly explains why CAT can defend against jailbreak prompts from the LLM's token space. Further, the robust bound shows that the robustness of an adversarially trained LLM is closely related to the singular values of its embedding matrix. Based on this, we propose to improve LLM CAT by introducing an additional regularization term, which depends on singular values of the LLM's embedding matrix, into the objective function of CAT. Experiments on real-world LLMs demonstrate that our method can help LLMs achieve a better jailbreak robustness-utility tradeoff. The code is available at https://github.com/fshp971/continuous-adv-icl.

large language model, machine learning, natural language, (20 more...)

2604.12817

Country: Europe > United Kingdom (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Yaakoubi, Chiheb, Louart, Cosme, Tiomoko, Malik, Liao, Zhenyu

Characterization of Gaussian Universality Breakdown in High-Dimensional Empirical Risk Minimization

arXiv.org Machine LearningApr-6-2026

We study high-dimensional convex empirical risk minimization (ERM) under general non-Gaussian data designs. By heuristically extending the Convex Gaussian Min-Max Theorem (CGMT) to non-Gaussian settings, we derive an asymptotic min-max characterization of key statistics, enabling approximation of the mean $μ_{\hatθ}$ and covariance $C_{\hatθ}$ of the ERM estimator $\hatθ$. Specifically, under a concentration assumption on the data matrix and standard regularity conditions on the loss and regularizer, we show that for a test covariate $x$ independent of the training data, the projection $\hatθ^\top x$ approximately follows the convolution of the (generally non-Gaussian) distribution of $μ_{\hatθ}^\top x$ with an independent centered Gaussian variable of variance $\text{Tr}(C_{\hatθ}\mathbb{E}[xx^\top])$. This result clarifies the scope and limits of Gaussian universality for ERMs. Additionally, we prove that any $\mathcal{C}^2$ regularizer is asymptotically equivalent to a quadratic form determined solely by its Hessian at zero and gradient at $μ_{\hatθ}$. Numerical simulations across diverse losses and models are provided to validate our theoretical predictions and qualitative insights.

artificial intelligence, machine learning, universality, (19 more...)

2604.03146

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia > Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Kinoshita, Yuri, Nishikawa, Naoki, Toyoizumi, Taro

Dataset Distillation Efficiently Encodes Low-Dimensional Representations from Gradient-Based Learning of Non-Linear Tasks

arXiv.org Machine LearningMar-31-2026

Dataset distillation, a training-aware data compression technique, has recently attracted increasing attention as an effective tool for mitigating costs of optimization and data storage. However, progress remains largely empirical. Mechanisms underlying the extraction of task-relevant information from the training process and the efficient encoding of such information into synthetic data points remain elusive. In this paper, we theoretically analyze practical algorithms of dataset distillation applied to the gradient-based training of two-layer neural networks with width $L$. By focusing on a non-linear task structure called multi-index model, we prove that the low-dimensional structure of the problem is efficiently encoded into the resulting distilled data. This dataset reproduces a model with high generalization ability for a required memory complexity of $\tildeΘ$$(r^2d+L)$, where $d$ and $r$ are the input and intrinsic dimensions of the task. To the best of our knowledge, this is one of the first theoretical works that include a specific task structure, leverage its intrinsic dimensionality to quantify the compression rate and study dataset distillation implemented solely via gradient-based algorithms.

artificial intelligence, machine learning, sd 1, (18 more...)

2603.1483

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Rajaraman, Nived, Huang, Audrey, Dudik, Miro, Schapire, Robert, Foster, Dylan J., Krishnamurthy, Akshay

Learning to Reason with Curriculum I: Provable Benefits of Autocurriculum

arXiv.org Machine LearningMar-20-2026

Chain-of-thought reasoning, where language models expend additional computation by producing thinking tokens prior to final responses, has driven significant advances in model capabilities. However, training these reasoning models is extremely costly in terms of both data and compute, as it involves collecting long traces of reasoning behavior from humans or synthetic generators and further post-training the model via reinforcement learning. Are these costs fundamental, or can they be reduced through better algorithmic design? We show that autocurriculum, where the model uses its own performance to decide which problems to focus training on, provably improves upon standard training recipes for both supervised fine-tuning (SFT) and reinforcement learning (RL). For SFT, we show that autocurriculum requires exponentially fewer reasoning demonstrations than non-adaptive fine-tuning, by focusing teacher supervision on prompts where the current model struggles. For RL fine-tuning, autocurriculum decouples the computational cost from the quality of the reference model, reducing the latter to a burn-in cost that is nearly independent of the target accuracy. These improvements arise purely from adaptive data selection, drawing on classical techniques from boosting and learning from counterexamples, and requiring no assumption on the distribution or difficulty of prompts.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2603.18325

Country:

North America > United States > Illinois (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Neural Information Processing SystemsFeb-19-2026, 03:54:15 GMT

Appendix

This is only for the ease of visualization. For linear MDP,In the generative model setting, Agarwal et al. [2020] shows model-based approach is still minimax optimal O((1 γ) 3SA/2)byusing as-absorbing MDP construction andthismodelbased technique is later reused for other more general settings (e.g. Itrequires high probability guarantee for learning optimal policyforany reward function, which is strictly stronger than the standard learning task that one only needs to learn to optimal policy for a fixed reward. B.2 GeneralabsorbingMDP The general absorbing MDP is defined as follows: for a fixed states and a sequence {ut}Ht=1, MDPMs,{ut}Ht=1 is identical toM for all states excepts, and state s is absorbing in the sense PMs,{ut}Ht=1(s|s,a) = 1 for all a, and the instantaneous reward at timet is rt(s,a) = ut for all a A. Also,weusetheshorthand notationVπ{s,ut} forVπs,Ms,{u We focus on the first claim. Later we shall remove the conditional onN (see SectionB.7). We use the singleton-absorbing MDPMs,{u?t}Ht=1 to handle the case (recallu?t

artificial intelligence, machine learning, slog, (17 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.54)
Information Technology > Artificial Intelligence > Machine Learning (0.34)

Neural Information Processing SystemsFeb-16-2026, 04:54:09 GMT

ExploringthePreciseDynamicsofSingle-LayerGAN Models: LeveragingMulti-FeatureDiscriminatorsfor High-DimensionalSubspaceLearning

Subspace learning is acritical endeavor in contemporary machine learning, particularly given the vast dimensions of modern datasets.

artificial intelligence, discriminator, machine learning, (17 more...)

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-12-2026, 21:56:06 GMT

f6ccfa588d2a95bef5a3b101c02524c9-Supplemental-Conference.pdf

It is known that Binary Segmentation is consistent but not optimal (Venkatraman (1992)). As an improvement, Fryzlewicz (2014) propose WildBinary Segmentation andshowthatithasabetter localization rate.

artificial intelligence, bappendix, machine learning, (18 more...)

Country: North America > United States > New Jersey > Hudson County > Hoboken (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-12-2026, 02:07:51 GMT

ErrorCompensatedDistributedSGD canbeAccelerated

In this work, we show for the first time that error compensated gradient compression methods can be accelerated. In particular, we propose and study the error compensated loopless Katyusha method, and establish an accelerated linear convergence rate under standard assumptions.

artificial intelligence, compressor, machine learning, (19 more...)