AITopics | ladder side-tuning

LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

Neural Information Processing SystemsDec-24-2025, 05:33:13 GMT

Fine-tuning large pre-trained models on downstream tasks has been adopted in a variety of domains recently. However, it is costly to update the entire parameter set of large pre-trained models. Although recently proposed parameter-efficient transfer learning (PETL) techniques allow updating a small subset of parameters (e.g.

backbone network, ladder side-tuning, memory efficient transfer learning, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.58)

Add feedback

Supplementary Materials for LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

Neural Information Processing SystemsAug-14-2025, 21:54:54 GMT

As presented in Section 3.2, our side networks are built on Transformer blocks (same as the backbone Accuracy on GLUE (%) Adapter block + gates 2.07 6.5 83.1 Transformer block + cross attention 2.68 10.4 83.0 Transformer block + gates (current design) 2.29 7.0 83.8 Table 2: Hyper-parameters used for NLP experiments. Batch size is 100 for all methods.Method Learning Rate Other Hyper-parameters Full fine-tuning 3 10 Batch size is 300 for all methods.Method Learning Rate Other Hyper-parameters Full fine-tuning 3 10

artificial intelligence, machine learning, side network, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.42)

Add feedback

54801e196796134a2b0ae5e8adef502f-Paper-Conference.pdf

Neural Information Processing SystemsAug-14-2025, 21:54:51 GMT

machine learning, natural language, side network, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Asia (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

Neural Information Processing SystemsOct-11-2024, 02:36:27 GMT

Fine-tuning large pre-trained models on downstream tasks has been adopted in a variety of domains recently. However, it is costly to update the entire parameter set of large pre-trained models. Although recently proposed parameter-efficient transfer learning (PETL) techniques allow updating a small subset of parameters (e.g. This is because the gradient computation for the trainable parameters still requires back-propagation through the large pre-trained backbone model. To address this, we propose Ladder Side-Tuning (LST), a new PETL technique that can reduce training memory requirements by more substantial amounts.

backbone network, ladder side-tuning, memory efficient transfer learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.63)

Add feedback

LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

Sung, Yi-Lin, Cho, Jaemin, Bansal, Mohit

arXiv.org Artificial IntelligenceOct-31-2022

Fine-tuning large pre-trained models on downstream tasks has been adopted in a variety of domains recently. However, it is costly to update the entire parameter set of large pre-trained models. Although recently proposed parameter-efficient transfer learning (PETL) techniques allow updating a small subset of parameters (e.g. only using 2% of parameters) inside a pre-trained backbone network for a new task, they only reduce the training memory requirement by up to 30%. This is because the gradient computation for the trainable parameters still requires backpropagation through the large pre-trained backbone model. To address this, we propose Ladder Side-Tuning (LST), a new PETL technique that can reduce training memory requirements by more substantial amounts. Unlike existing parameter-efficient methods that insert additional parameters inside backbone networks, we train a ladder side network, a small and separate network that takes intermediate activations as input via shortcut connections (called ladders) from backbone networks and makes predictions. LST has significantly lower memory requirements than previous methods, because it does not require backpropagation through the backbone network, but instead only through the side network and ladder connections. We evaluate our method with various models (T5 and CLIP-T5) on both NLP (GLUE) and vision-and-language (VQA, GQA, NLVR2 , MSCOCO) tasks. LST saves 69% of the memory costs to fine-tune the whole network, while other methods only save 26% of that in similar parameter usages (hence, 2.7x more memory savings). Moreover, LST achieves higher accuracy than Adapter and LoRA in a low-memory regime. To further show the advantage of this better memory efficiency, we also apply LST to larger T5 models, attaining better GLUE performance than full fine-tuning and other PETL methods. The accuracy-efficiency trade-off also holds on VL tasks.

artificial intelligence, machine learning, side network, (19 more...)

arXiv.org Artificial Intelligence

2206.06522

Country: