Mixed precision accumulation for neural network inference guided by componentwise forward error analysis
Arar, El-Mehdi El, Filip, Silviu-Ioan, Mary, Theo, Riccietti, Elisa
–arXiv.org Artificial Intelligence
Mixed precision accumulation for neural network inference guided by componentwise forward error analysis El-Mehdi El arar 1, Silviu-Ioan Filip 1, Theo Mary 2, and Elisa Riccietti 3 1 Inria, IRISA, Universit e de Rennes, 263 Av. G en eral Leclerc, F-35000, Rennes, France 2 Sorbonne Universit e, CNRS, LIP6, 4 Place Jussieu, F-75005, Paris, France 3 ENS de Lyon, CNRS, Inria, Universit e Claude Bernard Lyon 1 LIP, UMR 5668, 69342, Lyon cedex 07, France Abstract This work proposes a mathematically founded mixed precision accumulation strategy for the inference of neural networks. Our strategy is based on a new componentwise forward error analysis that explains the propagation of errors in the forward pass of neural networks. Specifically, our analysis shows that the error in each component of the output of a layer is proportional to the condition number of the inner product between the weights and the input, multiplied by the condition number of the activation function. These condition numbers can vary widely from one component to the other, thus creating a significant opportunity to introduce mixed precision: each component should be accumulated in a precision inversely proportional to the product of these condition numbers. We propose a practical algorithm that exploits this observation: it first computes all components in low precision, uses this output to estimate the condition numbers, and recomputes in higher precision only the components associated with large condition numbers. We test our algorithm on various networks and datasets and confirm experimentally that it can significantly improve the cost-accuracy tradeoff compared with uniform precision accumulation baselines. Keywords: Neural network, inference, error analysis, mixed precision, multiply-accumulate 1 Introduction Modern applications in artificial intelligence require increasingly complex models and thus increasing memory, time, and energy costs for storing and deploying large-scale deep learning models with parameter counts ranging in the millions and billions. This is a limiting factor both in the context of training and of inference. While the growing training costs can be tackled by the power of modern computing resources, notably GPU accelerators, the deployment of large-scale models leads to serious limitations in inference contexts with limited resources, such as embedded systems or applications that require real-time processing.
arXiv.org Artificial Intelligence
Mar-19-2025
- Country:
- Asia > Middle East
- Saudi Arabia > Northern Borders Province > Arar (0.24)
- Europe > France
- Brittany > Ille-et-Vilaine
- Rennes (0.24)
- Île-de-France > Paris
- Paris (0.24)
- Brittany > Ille-et-Vilaine
- North America > United States
- California > San Francisco County (0.14)
- Asia > Middle East
- Genre:
- Research Report (1.00)
- Technology: