Sequential Compression Layers for Efficient Federated Learning in Foundational Models

Mahla, Navyansh, Gupta, Sunny, Sethi, Amit

Dec-9-2024–arXiv.org Artificial Intelligence

Federated Learning (FL) has gained popularity for fine-tuning large language models (LLMs) across multiple nodes, each with its own private data. While LoRA has been widely adopted for parameter-efficient federated fine-tuning, recent theoretical and empirical studies highlight its suboptimal performance in the federated learning context. In response, we propose a novel, simple, and more effective parameter-efficient fine-tuning method that does not rely on LoRA. Our approach introduces a small multi-layer perceptron (MLP) layer between two existing MLP layers--the up_proj (the FFN projection layer following the self-attention module) and down_proj--within the feed-forward network of the transformer block. This solution addresses the bottlenecks associated with LoRA in federated fine-tuning and outperforms recent LoRA-based approaches, demonstrating superior performance for both language models and vision encoders.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

Dec-9-2024

arXiv.org PDF

Add feedback

Country:
- Asia > India (0.15)

Genre:
- Research Report (0.83)

Industry:
- Health & Medicine (0.47)
- Information Technology > Security & Privacy (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Perceptrons (0.88)
  - Natural Language > Large Language Model (0.58)