On Giant's Shoulders: Effortless Weakto Strong by Dynamic Logits Fusion
–Neural Information Processing Systems
Efficient fine-tuning of large language models for task-specific applications is imperative, yet the vast number of parameters in these models makes their training increasingly challenging. Despite numerous proposals for effective methods, a substantial memory overhead remains for gradient computations during updates. Can we fine-tune a series of task-specific small models and transfer their knowledge directly to a much larger model without additional training? In this paper, we explore weak-to-strong specialization using logit arithmetic, facilitating a direct answer to this question. Existing weak-to-strong methods often employ a static knowledge transfer ratio and a single small model for transferring complex knowledge, which leads to suboptimal performance.
Neural Information Processing Systems
May-23-2025, 05:17:18 GMT
- Country:
- Asia > China (0.28)
- North America
- Canada (0.28)
- United States > Minnesota
- Hennepin County > Minneapolis (0.14)
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (0.67)
- Research Report
- Industry:
- Banking & Finance (0.67)
- Education (0.67)
- Information Technology > Security & Privacy (0.46)
- Technology: