Evaluating the Energy Efficiency of NPU-Accelerated Machine Learning Inference on Embedded Microcontrollers
Fanariotis, Anastasios, Orphanoudakis, Theofanis, Fotopoulos, Vasilis
–arXiv.org Artificial Intelligence
The deployment of machine learning (ML) models on microcontrollers (MCUs) is constrained by strict energy, latency, and memory requirements, particularly in battery - operated and real - time edge devices. While software - level optimizations such as quantizatio n and pruning reduce model size and computation, hardware acceleration has emerged as a decisive enabler for efficient embedded inference. This paper evaluates the impact of Neural Processing Units (NPUs) on MCU - based ML execution, using the ARM Cortex - M55 core combined with the Ethos - U55 NPU on the Alif Semiconductor Ensemble E7 development board as a representative platform. A rigorous measurement methodology was employed, incorporating per - inference net energy accounting via GPIO - triggered high - resolutio n digital multimeter synchronization and idle - state subtraction, ensuring accurate attribution of energy costs. Experimental results across six representative ML models -- including MiniResNet, MobileNetV2, FD - MobileNet, MNIST, TinyYolo, and SSD - MobileNet -- dem onstrate substantial efficiency gains when inference is offloaded to the NPU. For moderate to large networks, latency improvements ranged from 7 to over 125, with per - inference net energy reductions up to 143 . Notably, the NPU enabled execution of model s unsupported on CPU - only paths, such as SSD - MobileNet, highlighting its functional as well as efficiency advantages. These findings establish NPUs as a cornerstone of energy - aware embedded AI, enabling real - time, power - constrained ML inference at the MCU level.
arXiv.org Artificial Intelligence
Sep-29-2025
- Country:
- Europe
- Greece > West Greece
- Patra (0.05)
- Italy (0.04)
- Greece > West Greece
- Europe
- Genre:
- Research Report > New Finding (0.94)
- Industry:
- Energy (0.49)
- Technology: