DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures using Lookup Tables

Ganji, Darshan C., Ashfaq, Saad, Saboori, Ehsan, Sah, Sudhakar, Mitra, Saptarshi, AskariHemmat, MohammadHossein, Hoffman, Alexander, Hassanien, Ahmed, Léonardon, Mathieu

Apr-18-2023–arXiv.org Artificial Intelligence

Quantization methods such as Learned Step Size ResNet34 74.1% 74.1% 72.4% Quantization can achieve model accuracy that is comparable ResNet50 76.9% 76.8% 74.6% to full-precision floating-point baselines even with subbyte VGG16 73.4% 73.5% 71.4% quantization. However, it is extremely challenging to deploy these ultra low-bit quantized models on mainstream CPU devices because commodity SIMD (Single Instruction, line, but achieving low latency inference with ultra low-bit Multiple Data) hardware typically supports no less than models on general purpose processors (GPPs) remains an 8-bit precision. To overcome this limitation, we propose active area of research [8, 11, 19]. DeepGEMM, a lookup table based approach for the execution Deep learning workloads on CPUs are typically accelerated of ultra low-precision convolutional neural networks by exploiting data-level parallelism through SIMD on SIMD hardware. The proposed method precomputes all programming. However, ultra low-bit deep learning operators possible products of weights and activations, stores them in can not be efficiently executed on these devices because a lookup table, and efficiently accesses them at inference sub-8-bit instructions are not generally supported in time to avoid costly multiply-accumulate operations. Our the vectorized instruction sets of mainstream CPU architectures 2-bit implementation outperforms corresponding 8-bit integer including SSE/AVX instructions on x86 and Neon instructions kernels in the QNNPACK framework by up to 1.74 on on Arm. Therefore, to enable ultra low-precision x86 platforms.

artificial intelligence, machine learning, opération, (18 more...)

arXiv.org Artificial Intelligence

Apr-18-2023

arXiv.org PDF

Add feedback

Country:
- Europe > France (0.14)
- North America > United States (0.14)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found