Multiplication-Free Transformer Training via Piecewise Affine Operations

Neural Information Processing Systems 

Neural network training consists largely of matrix multiplications that generally account for the vast majority of the computational cost for standard architectures such as transformers.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found