S {2} FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity

May-27-2025, 04:41:48 GMT–Neural Information Processing Systems

Current PEFT methods for LLMs can achieve high quality, efficient training, or scalable serving, but not all three simultaneously. To address this limitation, we investigate sparse fine-tuning and observe a remarkable improvement in generalization ability. Utilizing this key insight, we propose a family of Structured Sparse Fine-Tuning (S { 2} FT) methods for LLMs, which concurrently achieve state-of-the-art fine-tuning performance, training efficiency, and inference scalability. S { 2} FT accomplishes this by "selecting sparsely and computing densely". Based on the coupled structures in LLMs, \model selects a few attention heads and channels in the MHA and FFN modules for each Transformer block, respectively.

artificial intelligence, large language model, natural language, (4 more...)

Neural Information Processing Systems

May-27-2025, 04:41:48 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)