Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases

Open in new window