A Deep Learning Inference Scheme Based on Pipelined Matrix Multiplication Acceleration Design and Non-uniform Quantization

Zhang, Yuyang, Leung, Dik Hin, Guo, Min, Xiao, Yijia, Liu, Haoyue, Li, Yunfei, Zhang, Jiyuan, Wang, Guan, Chen, Zhen

Oct-10-2021–arXiv.org Artificial Intelligence

Matrix multiplication is the bedrock in Deep Learning inference application. When it comes to hardware acceleration on edge computing devices, matrix multiplication often takes up a great majority of the time. To achieve better performance in edge computing, we introduce a low-power Multi-layer Perceptron (MLP) accelerator based on a pipelined matrix multiplication scheme and a nonuniform quantization methodology. The implementation is running on Field-programmable Gate Array (FPGA) devices and tested its performance on handwritten digit classification and Q-learning tasks. Results show that our method can achieve better performance with fewer power consumption.

artificial intelligence, machine learning, neural network, (14 more...)

arXiv.org Artificial Intelligence

Oct-10-2021

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.14)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)