Sparse High Rank Adapters Kartikeya Bhardwaj

May-28-2025, 15:02:11 GMT–Neural Information Processing Systems

Low Rank Adaptation (LoRA) has gained massive attention in the recent generative AI research. One of the main advantages of LoRA is its ability to be fused with pretrained models, adding no overhead during inference. However, from a mobile deployment standpoint, we can either avoid inference overhead in the fused mode but lose the ability to switch adapters rapidly, or suffer significant (up to 30% higher) inference latency while enabling rapid switching in the unfused mode. LoRA also exhibits concept-loss when multiple adapters are used concurrently. In this paper, we propose Sparse High Rank Adapters (SHiRA), a new paradigm which incurs no inference overhead, enables rapid switching, and significantly reduces concept-loss.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

May-28-2025, 15:02:11 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.66)
  - Natural Language (1.00)
  - Representation & Reasoning (1.00)
  - Vision (1.00)