FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving

Open in new window