FP8 Quantization: The Power of the Exponent Andrey Kuzmin, Mart V an Baalen
–Neural Information Processing Systems
Neural network quantization is one of the most effective ways to improve the efficiency of neural networks. Quantization allows weights and activations to be represented in low bit-width formats, e.g. 8 bit integers (INT8).
Neural Information Processing Systems
Aug-15-2025, 04:39:24 GMT