NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention

Mar-22-2026, 12:45:20 GMT–Neural Information Processing Systems

Large Language Model (LLM) inference on Central Processing Units (CPU) is challenging due to the vast quantities of Multiply-Add (MAD) matrix operations in the attention computations.

artificial intelligence, large language model, natural language, (8 more...)

Neural Information Processing Systems

Mar-22-2026, 12:45:20 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)