Parameter-Efficient Fine-Tuning with Layer Pruning on Free-Text Sequence-to-Sequence Modeling
Zhu, Yunqi, Yang, Xuebing, Wu, Yuanyuan, Zhang, Wensheng
–arXiv.org Artificial Intelligence
The increasing size of language models raises great research interests in parameter-efficient fine-tuning such as LoRA that freezes the pre-trained model, and injects small-scale trainable parameters for multiple downstream tasks (e.g., summarization, question answering and translation). To further enhance the efficiency of fine-tuning, we propose a framework that integrates LoRA and structured layer pruning. The integrated framework is validated on two created deidentified medical report summarization datasets based on MIMIC-IV-Note and two public medical dialogue datasets. By tuning 0.6% parameters of the original model and pruning over 30% Transformer-layers, our framework can reduce 50% of GPU memory usage and speed up 100% of the training phase, while preserving over 92% generation qualities on free-text sequence-to-sequence tasks.
arXiv.org Artificial Intelligence
May-18-2023
- Country:
- Asia
- China > Guangdong Province
- Guangzhou (0.04)
- Middle East
- Israel (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- China > Guangdong Province
- Europe
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- Ireland > Leinster
- North America
- Dominican Republic (0.04)
- United States
- Massachusetts > Suffolk County
- Boston (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New York > New York County
- New York City (0.04)
- Texas > Travis County
- Austin (0.04)
- Washington > King County
- Seattle (0.04)
- Massachusetts > Suffolk County
- South America > Chile
- Asia
- Genre:
- Research Report (1.00)
- Industry:
- Technology: