Self-Improvement in Language Models: The Sharpening Mechanism

Huang, Audrey, Block, Adam, Foster, Dylan J., Rohatgi, Dhruv, Zhang, Cyril, Simchowitz, Max, Ash, Jordan T., Krishnamurthy, Akshay

Dec-4-2024–arXiv.org Machine Learning

Recent work in language modeling has raised the possibility of self-improvement, where a language models evaluates and refines its own generations to achieve higher performance without external feedback. It is impossible for this self-improvement to create information that is not already in the model, so why should we expect that this will lead to improved capabilities? We offer a new perspective on the capabilities of self-improvement through a lens we refer to as sharpening. Motivated by the observation that language models are often better at verifying response quality than they are at generating correct responses, we formalize self-improvement as using the model itself as a verifier during post-training in order to ``sharpen'' the model to one placing large mass on high-quality sequences, thereby amortizing the expensive inference-time computation of generating good sequences. We begin by introducing a new statistical framework for sharpening in which the learner aims to sharpen a pre-trained base policy via sample access, and establish fundamental limits. Then we analyze two natural families of self-improvement algorithms based on SFT and RLHF. We find that (i) the SFT-based approach is minimax optimal whenever the initial model has sufficient coverage, but (ii) the RLHF-based approach can improve over SFT-based self-improvement by leveraging online exploration, bypassing the need for coverage. Finally, we empirically validate the sharpening mechanism via inference-time and amortization experiments. We view these findings as a starting point toward a foundational understanding that can guide the design and evaluation of self-improvement algorithms.

large language model, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

Dec-4-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.92)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Education > Curriculum > Subject-Specific Education (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Reinforcement Learning (1.00)
    - Statistical Learning (0.93)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)
  - Representation & Reasoning
    - Search (0.87)
    - Uncertainty (1.00)