Lightweight Software Kernels and Hardware Extensions for Efficient Sparse Deep Neural Networks on Microcontrollers

Daghero, Francesco, Pagliari, Daniele Jahier, Conti, Francesco, Benini, Luca, Poncino, Massimo, Burrello, Alessio

Mar-19-2025–arXiv.org Artificial Intelligence

The acceleration of pruned Deep Neural Networks (DNNs) on edge devices such as Microcontrollers (MCUs) is a challenging task, given the tight area-and power-constraints of these devices. In this work, we propose a three-fold contribution to address this problem. First, we design a set of optimized software kernels for N:M pruned layers, targeting ultra-low-power, multicore RISC-V MCUs, which are up to 2.1 and 3.4 faster than their dense counterparts at 1:8 and 1:16 sparsity, respectively. Then, we implement a lightweight Instruction-Set Architecture (ISA) extension to accelerate the indirect load and non-zero indices decompression operations required by our kernels, obtaining up to 1.9 extra speedup, at the cost of a 5% area overhead. Lastly, we extend an open-source DNN compiler to utilize our sparse kernels for complete networks, showing speedups of 3.21 and 1.81 on a ResNet18 and a Vision Transformer (ViT), with less than 1.5% accuracy drop compared to a dense baseline. At the DNN model level, structured or semi-structured pruning forces specific patterns in the The execution of Deep Neural Networks (DNNs) on extreme positions of non-zero (NZ) weights, simplifying memory edge devices, such as IoT end-nodes based on Microcontrollers access and indices storage. A popular example is N:M pruning, (MCUs), has become increasingly popular (Wang in which exactly N weights are NZ, in every group of et al., 2020). Local execution enables smart capabilities in M (Zhou et al., 2021). Several solutions for accelerating these devices while avoiding the costly transmission of raw sparse workloads have been proposed at lower levels of the data, with advantages in latency predictability, data privacy, stack, ranging from optimized software kernels to custom and energy efficiency (Sze et al., 2017; Shi et al., 2016).

artificial intelligence, machine learning, sparsity, (11 more...)

arXiv.org Artificial Intelligence

Mar-19-2025

arXiv.org PDF

Add feedback

Country:
- Europe
  - Italy (0.28)
  - Switzerland > Zürich
    - Zürich (0.14)

Genre:
- Research Report (0.64)

Industry:
- Information Technology > Security & Privacy (0.54)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)