Block-Diagonal LoRA for Eliminating Communication Overhead in Tensor Parallel LoRA Serving

Open in new window