Bilevel ZOFO: Efficient LLM Fine-Tuning and Meta-Training

Neural Information Processing Systems 

Fine-tuning pre-trained Large Language Models (LLMs) for downstream tasks using First-Order (FO) optimizers presents significant computational challenges. Parameter-Efficient Fine-Tuning~(PEFT) methods have been proposed to address these challenges by freezing most model parameters and training only a small subset. While PEFT is efficient, it may not outperform full fine-tuning when high task-specific performance is required. Zeroth-Order (ZO) methods offer an alternative for fine-tuning the entire pre-trained model by approximating gradients using only the forward pass, thus eliminating the computational burden of back-propagation, % in first-order methods, but they converge painfully slowly and are very sensitive to the choice of task prompts. We bridge these worlds with Bilevel ZOFO, a penalty based bilevel formulation that treats adapter parameters as a lower level learner coupled to an upper level ZO optimizer of the full backbone.