FlattenQuant: Breaking Through the Inference Compute-bound for Large Language Models with Per-tensor Quantization

Open in new window