Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA

Zhu, Xuqi, Zhang, Huaizhi, Lee, JunKyu, Zhu, Jiacheng, Pal, Chandrajit, Saha, Sangeet, McDonald-Maier, Klaus D., Zhai, Xiaojun

Jul-7-2024–arXiv.org Artificial Intelligence

Modern Neural Network (NN) architectures heavily rely on vast numbers of multiply-accumulate arithmetic operations, constituting the predominant computational cost. Therefore, this paper proposes a high-throughput, scalable and energy efficient non-element-wise matrix multiplication unit on FPGAs as a basic component of the NNs. We firstly streamline inter-layer and intra-layer redundancies of MADDNESS algorithm, a LUT-based approximate matrix multiplication, to design a fast, efficient scalable approximate matrix multiplication module termed "Approximate Multiplication Unit (AMU)". The AMU optimizes LUT-based matrix multiplications further through dedicated memory management and access design, decoupling computational overhead from input resolution and boosting FPGA-based NN accelerator efficiency significantly. The experimental results show that using our AMU achieves up to 9x higher throughput and 112x higher energy efficiency over the state-of-the-art solutions for the FPGA-based Quantised Neural Network (QNN) accelerators.

accelerator, matrix multiplication, multiplication, (17 more...)

arXiv.org Artificial Intelligence

Jul-7-2024

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom (0.14)
- North America
  - United States > New York
    - New York County > New York City (0.04)
  - Canada > Quebec
    - Montreal (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found