Endor: Hardware-Friendly Sparse Format for Offloaded LLM Inference

Open in new window