Fine-Tuning Language Models with Just Forward Passes Sadhika Malladi Tianyu Gao

Neural Information Processing Systems 

In this work, we propose a memory-efficient zeroth-order optimizer ( MeZO), adapting the classical ZO-SGD method to operate in-place, thereby fine-tuning LMs with the same memory footprint as inference .