Goto

Collaborating Authors

 side network



54801e196796134a2b0ae5e8adef502f-Paper-Conference.pdf

Neural Information Processing Systems

Although recently proposed parameter-efficient transfer learning (PETL) techniques allowupdating asmallsubsetofparameters (e.g. This is because the gradient computation for the trainable parameters still requires backpropagation through thelargepre-trained backbone model.


LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

Neural Information Processing Systems

Fine-tuning large pre-trained models on downstream tasks has been adopted in a variety of domains recently. However, it is costly to update the entire parameter set of large pre-trained models. Although recently proposed parameter-efficient transfer learning (PETL) techniques allow updating a small subset of parameters (e.g.


Dialect Identification Using Resource-Efficient Fine-Tuning Approaches

Lin, Zirui, Gulzar, Haris, Busto, Monnika Roslianna, Masaki, Akiko, Eda, Takeharu, Nakadai, Kazuhiro

arXiv.org Artificial Intelligence

Dialect Identification (DI) is a task to recognize different dialects within the same language from a speech signal. DI can help to improve the downstream speech related tasks even when speakers have a strong dialect. However, fine-tuning a speech model for tasks like DI is expensive in terms of computation cost and memory requirement. Recent studies have explored fine-tuning pre-trained speech models for tasks like DI using Parameter-Efficient Fine-Tuning (PEFT) methods, which offer parameter efficiency but limited improvement in memory efficiency and training speed. To address these challenges, we explore Memory-Efficient Fine-Tuning (MEFT) methods, originally proposed for language processing, and apply them to the general-purpose pre-trained speech model. We then comprehensively analyze the GPU memory usage and fine-tuning speed based on various MEFT methods. As a case study, we fine-tune the Whisper model to identify six Mandarin subdialects from the KeSpeech dataset, reducing GPU memory usage by up to 73.25% and accelerating training speed by a factor of 2.1, while maintaining accuracy comparable to vanilla fine-tuning and PEFT methods.


Supplementary Materials for LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

Neural Information Processing Systems

As presented in Section 3.2, our side networks are built on Transformer blocks (same as the backbone Accuracy on GLUE (%) Adapter block + gates 2.07 6.5 83.1 Transformer block + cross attention 2.68 10.4 83.0 Transformer block + gates (current design) 2.29 7.0 83.8 Table 2: Hyper-parameters used for NLP experiments. Batch size is 100 for all methods.Method Learning Rate Other Hyper-parameters Full fine-tuning 3 10 Batch size is 300 for all methods.Method Learning Rate Other Hyper-parameters Full fine-tuning 3 10



PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning

Yang, Xingke, Li, Liang, Wan, Zhiyi, Li, Sicong, Qi, Xiaoqi, Liu, Jiang, Ohtsuki, Tomoaki, Fu, Xin, Pan, Miao

arXiv.org Artificial Intelligence

There is a huge gap between numerous intriguing applications fostered by on-device large language model (LLM) fine-tuning (FT) from fresh mobile data and the limited resources of a mobile device. While existing server-assisted methods (e.g., split learning or side-tuning) may enable LLM FT on the local mobile device, they suffer from heavy communication burdens of activation transmissions, and may disclose data and labels to the server. To address those issues, we develop PAE MobiLLM, a a privacy-aware and efficient LLM FT method which can be deployed on the mobile device via server-assisted additive side-tuning. To further accelerate FT convergence and improve computing efficiency, PAE MobiLLM integrates activation caching on the server side, which allows the server to reuse historical activations and saves the mobile device from repeatedly computing forward passes for the recurring data samples. Besides, to reduce communication cost, PAE MobiLLM develops an activation shortcut that transmits only the token involved in the loss calculation instead of full activation matrices to guide the side network tuning. Last but not least, PAE MobiLLM introduces the additive adapter side-network design which makes the server train the adapter modules based on device-defined prediction differences rather than raw ground-truth labels. In this way, the server can only assist device-defined side-network computing, and learn nothing about data and labels. Extensive experimental results demonstrate PAE MobiLLM's superiority.


EffOWT: Transfer Visual Language Models to Open-World Tracking Efficiently and Effectively

Wang, Bingyang, Huang, Kaer, Li, Bin, Yan, Yiqiang, Zhang, Lihe, Lu, Huchuan, He, You

arXiv.org Artificial Intelligence

Open-World Tracking (OWT) aims to track every object of any category, which requires the model to have strong generalization capabilities. Trackers can improve their generalization ability by leveraging Visual Language Models (VLMs). However, challenges arise with the fine-tuning strategies when VLMs are transferred to OWT: full fine-tuning results in excessive parameter and memory costs, while the zero-shot strategy leads to sub-optimal performance. To solve the problem, EffOWT is proposed for efficiently transferring VLMs to OWT. Specifically, we build a small and independent learnable side network outside the VLM backbone. By freezing the backbone and only executing backpropagation on the side network, the model's efficiency requirements can be met. In addition, EffOWT enhances the side network by proposing a hybrid structure of Transformer and CNN to improve the model's performance in the OWT field. Finally, we implement sparse interactions on the MLP, thus reducing parameter updates and memory costs significantly. Thanks to the proposed methods, EffOWT achieves an absolute gain of 5.5% on the tracking metric OWTA for unknown categories, while only updating 1.3% of the parameters compared to full fine-tuning, with a 36.4% memory saving. Other metrics also demonstrate obvious improvement.


GraphBridge: Towards Arbitrary Transfer Learning in GNNs

Ju, Li, Yang, Xingyi, Li, Qi, Wang, Xinchao

arXiv.org Artificial Intelligence

Graph neural networks (GNNs) are conventionally trained on a per-domain, per-task basis. It creates a significant barrier in transferring the acquired knowledge to different, heterogeneous data setups. This paper introduces GraphBridge, a novel framework to enable knowledge transfer across disparate tasks and domains in GNNs, circumventing the need for modifications to task configurations or graph structures. Specifically, GraphBridge allows for the augmentation of any pre-trained GNN with prediction heads and a bridging network that connects the input to the output layer. This architecture not only preserves the intrinsic knowledge of the original model but also supports outputs of arbitrary dimensions. To mitigate the negative transfer problem, GraphBridge merges the source model with a concurrently trained model, thereby reducing the source bias when applied to the target domain. Our method is thoroughly evaluated across diverse transfer learning scenarios, including Graph2Graph, Node2Node, Graph2Node, and graph2point-cloud. Empirical validation, conducted over 16 datasets representative of these scenarios, confirms the framework's capacity for task- and domain-agnostic transfer learning within graph-like data, marking a significant advancement in the field of GNNs. Code is available at https://github.com/jujulili888/GraphBridge.


LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

Neural Information Processing Systems

Fine-tuning large pre-trained models on downstream tasks has been adopted in a variety of domains recently. However, it is costly to update the entire parameter set of large pre-trained models. Although recently proposed parameter-efficient transfer learning (PETL) techniques allow updating a small subset of parameters (e.g. This is because the gradient computation for the trainable parameters still requires back-propagation through the large pre-trained backbone model. To address this, we propose Ladder Side-Tuning (LST), a new PETL technique that can reduce training memory requirements by more substantial amounts.