Goto

Collaborating Authors

 tiny deep learning


MCUNet: Tiny Deep Learning on IoT Devices

Neural Information Processing Systems

Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones. We propose MCUNet, a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine), enabling ImageNet-scale inference on microcontrollers. TinyNAS adopts a two-stage neural architecture search approach that first optimizes the search space to fit the resource constraints, then specializes the network architecture in the optimized search space. TinyNAS can automatically handle diverse constraints (i.e.


Memory-efficient Patch-based Inference for Tiny Deep Learning

Neural Information Processing Systems

Tiny deep learning on microcontroller units (MCUs) is challenging due to the limited memory size. We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs: the first several blocks have an order of magnitude larger memory usage than the rest of the network. To alleviate this issue, we propose a generic patch-by-patch inference scheduling, which operates only on a small spatial region of the feature map and significantly cuts down the peak memory. However, naive implementation brings overlapping patches and computation overhead. We further propose receptive field redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead. Manually redistributing the receptive field is difficult.


From Tiny Machine Learning to Tiny Deep Learning: A Survey

Somvanshi, Shriyank, Islam, Md Monzurul, Chhetri, Gaurab, Chakraborty, Rohit, Mimi, Mahmuda Sultana, Shuvo, Sawgat Ahmed, Islam, Kazi Sifatul, Javed, Syed Aaqib, Rafat, Sharif Ahmed, Dutta, Anandi, Das, Subasish

arXiv.org Artificial Intelligence

The rapid growth of edge devices has driven the demand for deploying artificial intelligence (AI) at the edge, giving rise to Tiny Machine Learning (TinyML) and its evolving counterpart, Tiny Deep Learning (TinyDL). While TinyML initially focused on enabling simple inference tasks on microcontrollers, the emergence of TinyDL marks a paradigm shift toward deploying deep learning models on severely resource-constrained hardware. This survey presents a comprehensive overview of the transition from TinyML to TinyDL, encompassing architectural innovations, hardware platforms, model optimization techniques, and software toolchains. We analyze state-of-the-art methods in quantization, pruning, and neural architecture search (NAS), and examine hardware trends from MCUs to dedicated neural accelerators. Furthermore, we categorize software deployment frameworks, compilers, and AutoML tools enabling practical on-device learning. Applications across domains such as computer vision, audio recognition, healthcare, and industrial monitoring are reviewed to illustrate the real-world impact of TinyDL. Finally, we identify emerging directions including neuromorphic computing, federated TinyDL, edge-native foundation models, and domain-specific co-design approaches. This survey aims to serve as a foundational resource for researchers and practitioners, offering a holistic view of the ecosystem and laying the groundwork for future advancements in edge AI.


Review for NeurIPS paper: MCUNet: Tiny Deep Learning on IoT Devices

Neural Information Processing Systems

The co-designing mechanism is the core contribution of the paper but the detailed process is not described clearly. It is suggested to provide an elaborate diagram or pseudocode to introduce the whole framework. It is meaningful to explore its generalization ability. Please demonstrate the potential/ability of MCUNet being deployed on more scenarios and tasks, e.g., more devices or tasks like object detection, semantic segmentation. But the experiments failed to highlight the improvements of the co-design scheme, when compared to those single design schemes. But it is unclear whether the overall network topology indeed is the main reason for the huge improvements.


Review for NeurIPS paper: MCUNet: Tiny Deep Learning on IoT Devices

Neural Information Processing Systems

All four knowledgeable referees support acceptance for the contributions, notably co-designing TinyNAS and TinyEngine for deep learning on IoT devices and promising experimental results on ImageNet, and I also recommend acceptance. Please make it sure to appropriately reflect what has been promised through rebuttal such as elaboration on co-design.


MCUNet: Tiny Deep Learning on IoT Devices

Neural Information Processing Systems

Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones. We propose MCUNet, a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine), enabling ImageNet-scale inference on microcontrollers. TinyNAS adopts a two-stage neural architecture search approach that first optimizes the search space to fit the resource constraints, then specializes the network architecture in the optimized search space. TinyNAS can automatically handle diverse constraints (i.e. TinyNAS is co-designed with TinyEngine, a memory-efficient inference library to expand the search space and fit a larger model.


Memory-efficient Patch-based Inference for Tiny Deep Learning

Neural Information Processing Systems

Tiny deep learning on microcontroller units (MCUs) is challenging due to the limited memory size. We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs: the first several blocks have an order of magnitude larger memory usage than the rest of the network. To alleviate this issue, we propose a generic patch-by-patch inference scheduling, which operates only on a small spatial region of the feature map and significantly cuts down the peak memory. However, naive implementation brings overlapping patches and computation overhead. We further propose receptive field redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead.


GitHub - mit-han-lab/tinyengine: [NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; MCUNetV3: On-Device Training Under 256KB Memory

#artificialintelligence

This is the official implementation of TinyEngine, a memory-efficient and high-performance neural network library for Microcontrollers. TinyEngine is a part of MCUNet, which also consists of TinyNAS. MCUNet is a system-algorithm co-design framework for tiny deep learning on microcontrollers. TinyEngine and TinyNAS are co-designed to fit the tight memory budgets. We will soon release Tiny Training Engine used in MCUNetV3: On-Device Training Under 256KB Memory. If you are interested in getting updates, please sign up here to get notified!