external data
- North America > United States > Texas > Travis County > Austin (0.27)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- (30 more...)
- Research Report > New Finding (0.92)
- Research Report > Experimental Study (0.67)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- (2 more...)
- Information Technology > Software > Programming Languages (1.00)
- Information Technology > Data Science > Data Quality (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- (7 more...)
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
Dong, Yihong, Jiang, Xue, Tao, Yongding, Liu, Huanyu, Zhang, Kechi, Mou, Lili, Cao, Rongyu, Ma, Yingwei, Chen, Jue, Li, Binhua, Jin, Zhi, Huang, Fei, Li, Yongbin, Li, Ge
Reinforcement Learning with V erifiable Reward (RL VR) has significantly advanced the complex reasoning abilities of Large Language Models (LLMs). However, it struggles to break through the inherent capability boundaries of the base LLM, due to its essentially on-policy strategy coupled with LLM's immense action space and sparse reward. Critically, RL VR can lead to the capability boundary collapse, narrowing the LLM's problem-solving scope. To address this problem, we propose RL-PLUS, a novel hybrid-policy optimization approach for LLMs that synergizes internal exploitation with external data to achieve stronger reasoning capabilities and surpass the boundaries of base models. RL-PLUS integrates two core components, i.e., Multiple Importance Sampling to address distributional mismatch from external data, and Exploration-Based Advantage Function to guide the model towards high-value, unexplored reasoning paths. We provide both theoretical analysis and extensive experiments to demonstrate the superiority and gen-eralizability of our approach. Compared with existing RL VR methods, RL-PLUS achieves 1) state-of-the-art performance on six math reasoning benchmarks; 2) superior performance on six out-of-distribution reasoning tasks; 3) consistent and significant gains across diverse model families, with average relative improvements up to 69.2%. Moreover, the analysis of Pass@k curves indicates that RL-PLUS effectively resolves the capability boundary collapse problem.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Theoretical Modeling of LLM Self-Improvement Training Dynamics Through Solver-Verifier Gap
Sun, Yifan, Liang, Yushan, Zhang, Zhen, Teng, Jiaye
Self-improvement is among the most prominent techniques within the realm of large language models (LLM), aiming to enhance the LLM performance without relying on external data. Despite its significance, generally how LLM performances evolve during the self-improvement process remains underexplored. In this paper, we theoretically model the training dynamics of self-improvement via the concept of solver-verifier gap. This is inspired by the conjecture that the performance enhancement of self-improvement stems from the gap between LLM's solver capability and verifier capability. Based on the theoretical framework, we further show how to model the entire training trajectory. This framework allows quantifying the capability limit of self-improvement by fitting the theoretical model to the experiment results. We empirically validate the effectiveness of the theoretical framework on various LLMs and datasets. Beyond self-improvement, we extend our analysis to investigate how external data influences these dynamics within the framework. Notably, we find that under limited external data regimes, such external data can be utilized at any stage without significantly affecting final performances, which accords with the empirical observations.
Characterizing Model Behavior Under Synthetic Data Training: An Empirical Study Across Scales and Mixing Ratios
Du, Y., Wu, G., Tang, G., Wang, W., Fan, Q.
Synthetic data generated by large language models has become integral to modern NLP training pipelines, from bootstrapping reasoning capabilities to augmenting instruction-following datasets. While recent work demonstrates successful applications maintaining high external data ratios, systematic understanding of how synthetic data proportion affects model behavior across different scales remains limited. This paper presents a controlled empirical study examining model performance, calibration, and output characteristics when trained on varying synthetic-to-external data ratios. Using the Pythia model suite (410M-12B parameters) across five diverse tasks, we evaluate models after one to three training iterations with synthetic data proportions ranging from 0-50\%. Our key findings include: models maintain stable performance with up to 20\% synthetic data, but degradation accelerates beyond 30\%; larger models (6.9B-12B) show greater robustness to synthetic data than smaller models (410M-1.4B); calibration degradation precedes accuracy loss, providing an early warning signal; and task characteristics matter, with reasoning tasks degrading faster than retrieval tasks under synthetic data training. Importantly, we find that current best practices, such as those employed in STaR and Self-Instruct systems that maintain greater than 80\% external data, operate well within safe regimes identified by our experiments. We provide practical guidance for practitioners on synthetic data budgets based on model scale and task requirements, alongside detailed comparison with concurrent work including Shumailov et al.'s model collapse findings.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.88)
Reveal and Release: Iterative LLM Unlearning with Self-generated Data
Xie, Linxi, Teng, Xin, Ke, Shichang, Wen, Hongyi, Wang, Shengjie
Large language model (LLM) unlearning has demonstrated effectiveness in removing the influence of undesirable data (also known as forget data). Existing approaches typically assume full access to the forget dataset, overlooking two key challenges: (1) Forget data is often privacy-sensitive, rare, or legally regulated, making it expensive or impractical to obtain (2) The distribution of available forget data may not align with how that information is represented within the model. To address these limitations, we propose a ``Reveal-and-Release'' method to unlearn with self-generated data, where we prompt the model to reveal what it knows using optimized instructions. To fully utilize the self-generated forget data, we propose an iterative unlearning framework, where we make incremental adjustments to the model's weight space with parameter-efficient modules trained on the forget data. Experimental results demonstrate that our method balances the tradeoff between forget quality and utility preservation.
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
A Analysis of models
We observe that the measured robust accuracy converges after 50 steps. This is further indication that attacks converge in 100 steps. Finally, we analyze the adversarial loss landscapes of the model considered in the previous paragraph. Implementation details of the augmentations used in the paper . W A reaches 55.68% robust accuracy (and 86.51% clean accuracy).
Robust Data Fusion via Subsampling
Wang, Jing, Wang, HaiYing, Chen, Kun
Data fusion and transfer learning are rapidly growing fields that enhance model performance for a target population by leveraging other related data sources or tasks. The challenges lie in the various potential heterogeneities between the target and external data, as well as various practical concerns that prevent a naïve data integration. We consider a realistic scenario where the target data is limited in size while the external data is large but contaminated with outliers; such data contamination, along with other computational and operational constraints, necessitates proper selection or subsampling of the external data for transfer learning. To our knowledge,transfer learning and subsampling under data contamination have not been thoroughly investigated. We address this gap by studying various transfer learning methods with subsamples of the external data, accounting for outliers deviating from the underlying true model due to arbitrary mean shifts. Two subsampling strategies are investigated: one aimed at reducing biases and the other at minimizing variances. Approaches to combine these strategies are also introduced to enhance the performance of the estimators. We provide non-asymptotic error bounds for the transfer learning estimators, clarifying the roles of sample sizes, signal strength, sampling rates, magnitude of outliers, and tail behaviors of model error distributions, among other factors. Extensive simulations show the superior performance of the proposed methods. Additionally, we apply our methods to analyze the risk of hard landings in A380 airplanes by utilizing data from other airplane types,demonstrating that robust transfer learning can improve estimation efficiency for relatively rare airplane types with the help of data from other types of airplanes.
- Transportation > Air (1.00)
- Health & Medicine (1.00)
Domain Generalization and Adaptation in Intensive Care with Anchor Regression
Londschien, Malte, Burger, Manuel, Rätsch, Gunnar, Bühlmann, Peter
The performance of predictive models in clinical settings often degrades when deployed in new hospitals due to distribution shifts. This paper presents a large-scale study of causality-inspired domain generalization on heterogeneous multi-center intensive care unit (ICU) data. We apply anchor regression and introduce anchor boosting, a novel, tree-based nonlinear extension, to a large dataset comprising 400,000 patients from nine distinct ICU databases. The anchor regularization consistently improves out-of-distribution performance, particularly for the most dissimilar target domains. The methods appear robust to violations of theoretical assumptions, such as anchor exogeneity. Furthermore, we propose a novel conceptual framework to quantify the utility of large external data datasets. By evaluating performance as a function of available target-domain data, we identify three regimes: (i) a domain generalization regime, where only the external model should be used, (ii) a domain adaptation regime, where refitting the external model is optimal, and (iii) a data-rich regime, where external data provides no additional value.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > China (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (3 more...)
Lost in Retraining: Roaming the Parameter Space of Exponential Families Under Closed-Loop Learning
Jangjoo, Fariba, Marsili, Matteo, Roudi, Yasser
Closed-loop learning is the process of repeatedly estimating a model from data generated from the model itself. It is receiving great attention due to the possibility that large neural network models may, in the future, be primarily trained with data generated by artificial neural networks themselves. We study this process for models that belong to exponential families, deriving equations of motions that govern the dynamics of the parameters. We show that maximum likelihood estimation of the parameters endows sufficient statistics with the martingale property and that as a result the process converges to absorbing states that amplify initial biases present in the data. However, we show that this outcome may be prevented if the data contains at least one data point generated from a ground truth model, by relying on maximum a posteriori estimation or by introducing regularisation.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)