FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity Xinyu Y ang

Neural Information Processing Systems 

Current PEFT methods for LLMs can achieve high quality, efficient training, or scalable serving, but not all three simultaneously.