FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference

Open in new window