Supplementary Materials for LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

Aug-14-2025, 21:54:54 GMT–Neural Information Processing Systems

As presented in Section 3.2, our side networks are built on Transformer blocks (same as the backbone Accuracy on GLUE (%) Adapter block + gates 2.07 6.5 83.1 Transformer block + cross attention 2.68 10.4 83.0 Transformer block + gates (current design) 2.29 7.0 83.8 Table 2: Hyper-parameters used for NLP experiments. Batch size is 100 for all methods.Method Learning Rate Other Hyper-parameters Full fine-tuning 3 10 Batch size is 300 for all methods.Method Learning Rate Other Hyper-parameters Full fine-tuning 3 10

ladder side-tuning, side network, supplementary material, (9 more...)

Neural Information Processing Systems

Aug-14-2025, 21:54:54 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.42)

Duplicate Docs Excel Report

Title
54801e196796134a2b0ae5e8adef502f-Supplemental-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found