FlashDecoding++: Faster Large Language Model Inference on GPUs

Open in new window