Loquetier: A Virtualized Multi-LoRA Framework for Unified LLM Fine-tuning and Serving

Neural Information Processing Systems 

Low-Rank Adaptation (LoRA) has become a widely adopted parameter-efficient fine-tuning (PEFT) technique for adapting large language models (LLMs) to downstream tasks. While prior work has explored strategies for integrating LLM training and serving, there still remains a gap in unifying fine-tuning and inference for LoRA-based models.