NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention

Neural Information Processing Systems 

We leverage this unique capability to propose NoMAD-Attention, an efficient attention algorithm that replaces MAD operations with in-register lookups.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found