SPACE: Noise Contrastive Estimation Stabilizes Self-Play Fine-Tuning for Large Language Models

Jun-12-2026, 06:38:25 GMT–Neural Information Processing Systems

Self-play fine-tuning has demonstrated promising abilities in adapting large language models (LLMs) to downstream tasks with limited real-world data. The basic principle is to iteratively refine the model with real samples and synthetic ones generated from itself. However, the existing methods primarily focus on the relative gaps between the rewards for two types of data, neglecting their absolute values. Through theoretical analysis, we identify that the gap-based methods suffer from unstable evolution, due to the potentially degenerated objectives.

artificial intelligence, large language model, natural language, (8 more...)

Neural Information Processing Systems

Jun-12-2026, 06:38:25 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.64)