On Giant's Shoulders: Effortless Weakto Strong by Dynamic Logits Fusion

May-23-2025, 05:17:18 GMT–Neural Information Processing Systems

Efficient fine-tuning of large language models for task-specific applications is imperative, yet the vast number of parameters in these models makes their training increasingly challenging. Despite numerous proposals for effective methods, a substantial memory overhead remains for gradient computations during updates. Can we fine-tune a series of task-specific small models and transfer their knowledge directly to a much larger model without additional training? In this paper, we explore weak-to-strong specialization using logit arithmetic, facilitating a direct answer to this question. Existing weak-to-strong methods often employ a static knowledge transfer ratio and a single small model for transferring complex knowledge, which leads to suboptimal performance.

knowledge management, large language model, machine learning, (18 more...)

Neural Information Processing Systems

May-23-2025, 05:17:18 GMT

Conferences PDF

Add feedback

Country:
- Asia > China (0.28)
- North America
  - Canada (0.28)
  - United States > Minnesota
    - Hennepin County > Minneapolis (0.14)

Genre:
- Research Report
  - Experimental Study (0.93)
  - New Finding (0.67)

Industry:
- Banking & Finance (0.67)
- Education (0.67)
- Information Technology > Security & Privacy (0.46)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (0.93)
    - Natural Language > Large Language Model (1.00)
    - Representation & Reasoning (1.00)
  - Knowledge Management > Knowledge Engineering (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found