Melnick, Levi
Microscaling Data Formats for Deep Learning
Rouhani, Bita Darvish, Zhao, Ritchie, More, Ankit, Hall, Mathew, Khodamoradi, Alireza, Deng, Summer, Choudhary, Dhruv, Cornea, Marius, Dellinger, Eric, Denolf, Kristof, Dusan, Stosic, Elango, Venmugil, Golub, Maximilian, Heinecke, Alexander, James-Roxby, Phil, Jani, Dharmesh, Kolhe, Gaurav, Langhammer, Martin, Li, Ada, Melnick, Levi, Mesmakhosroshahi, Maral, Rodriguez, Andres, Schulte, Michael, Shafipour, Rasoul, Shao, Lei, Siu, Michael, Dubey, Pradeep, Micikevicius, Paulius, Naumov, Maxim, Verrilli, Colin, Wittig, Ralph, Burger, Doug, Chung, Eric
Narrow bit-width data formats are key to reducing the computational and storage costs of modern deep learning applications. This paper evaluates Microscaling (MX) data formats that combine a per-block scaling factor with narrow floating-point and integer types for individual elements. MX formats balance the competing needs of hardware efficiency, model accuracy, and user friction. Empirical results on over two dozen benchmarks demonstrate practicality of MX data formats as a drop-in replacement for baseline FP32 for AI inference and training with low user friction. We also show the first instance of training generative language models at sub-8-bit weights, activations, and gradients with minimal accuracy loss and no modifications to the training recipe.
With Shared Microexponents, A Little Shifting Goes a Long Way
Rouhani, Bita, Zhao, Ritchie, Elango, Venmugil, Shafipour, Rasoul, Hall, Mathew, Mesmakhosroshahi, Maral, More, Ankit, Melnick, Levi, Golub, Maximilian, Varatkar, Girish, Shao, Lei, Kolhe, Gaurav, Melts, Dimitry, Klar, Jasmine, L'Heureux, Renee, Perry, Matt, Burger, Doug, Chung, Eric, Deng, Zhaoxia, Naghshineh, Sam, Park, Jongsoo, Naumov, Maxim
This paper introduces Block Data Representations (BDR), a framework for exploring and evaluating a wide spectrum of narrow-precision formats for deep learning. It enables comparison of popular quantization standards, and through BDR, new formats based on shared microexponents (MX) are identified, which outperform other state-of-the-art quantization approaches, including narrow-precision floating-point and block floating-point. MX utilizes multiple levels of quantization scaling with ultra-fine scaling factors based on shared microexponents in the hardware. The effectiveness of MX is demonstrated on real-world models including large-scale generative pretraining and inferencing, and production-scale recommendation systems.
Neural Additive Models: Interpretable Machine Learning with Neural Nets
Agarwal, Rishabh, Melnick, Levi, Frosst, Nicholas, Zhang, Xuezhou, Lengerich, Ben, Caruana, Rich, Hinton, Geoffrey
Deep neural networks (DNNs) are powerful black-box predictors that have achieved impressive performance on a wide variety of tasks. However, their accuracy comes at the cost of intelligibility: it is usually unclear how they make their decisions. This hinders their applicability to high stakes decision-making domains such as healthcare. We propose Neural Additive Models (NAMs) which combine some of the expressivity of DNNs with the inherent intelligibility of generalized additive models. NAMs learn a linear combination of neural networks that each attend to a single input feature. These networks are trained jointly and can learn arbitrarily complex relationships between their input feature and the output. Our experiments on regression and classification datasets show that NAMs are more accurate than widely used intelligible models such as logistic regression and shallow decision trees. They perform similarly to existing state-of-the-art generalized additive models in accuracy, but are more flexible because they are based on neural nets instead of boosted trees. To demonstrate this, we show how NAMs can be used for multitask learning on synthetic data and on the COMPAS recidivism data due to their composability, and demonstrate that the differentiability of NAMs allows them to train more complex interpretable models for COVID-19.