Accelerating a Triton Fused Kernel for W4A16 Quantized Inference with SplitK work decomposition

Open in new window