SPACE: Noise Contrastive Estimation Stabilizes Self-Play Fine-Tuning for Large Language Models
–Neural Information Processing Systems
Self-play fine-tuning has demonstrated promising abilities in adapting large language models (LLMs) to downstream tasks with limited real-world data. The basic principle is to iteratively refine the model with real samples and synthetic ones generated from itself. However, the existing methods primarily focus on the relative gaps between the rewards for two types of data, neglecting their absolute values. Through theoretical analysis, we identify that the gap-based methods suffer from unstable evolution, due to the potentially degenerated objectives.
Neural Information Processing Systems
Jun-12-2026, 06:38:25 GMT
- Technology: