Multiplication-Free Transformer Training via Piecewise Affine Operations
–Neural Information Processing Systems
Neural network training consists largely of matrix multiplications that generally account for the vast majority of the computational cost for standard architectures such as transformers.
Neural Information Processing Systems
Feb-8-2026, 11:25:54 GMT