Neuromorphic Principles for Efficient Large Language Models on Intel Loihi 2

Abreu, Steven, Shrestha, Sumit Bam, Zhu, Rui-Jie, Eshraghian, Jason

Feb-11-2025–arXiv.org Artificial Intelligence

Large language models (LLMs) deliver impressive performance but require large amounts of energy. In this work, we present a MatMul-free LLM architecture adapted for Intel's neuromorphic processor, Loihi 2. Our approach leverages Loihi 2's support for low-precision, event-driven computation and stateful processing. Our hardware-aware quantized model on GPU demonstrates that a 370M parameter MatMul-free model can be quantized with no accuracy loss. Based on preliminary results, we report up to 3x higher throughput with 2x less energy, compared to transformer-based LLMs on an edge GPU, with significantly better scaling. Further hardware optimizations will increase throughput and decrease energy consumption. These results show the potential of neuromorphic hardware for efficient inference and pave the way for efficient reasoning models capable of generating complex, long-form text rapidly and cost-effectively.

large language model, loihi 2, machine learning, (20 more...)

arXiv.org Artificial Intelligence

Feb-11-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Qatar (0.14)
- North America > United States
  - California (0.14)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Energy (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)