microcontroller
MCUNet: Tiny Deep Learning on IoT Devices
Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones. We propose MCUNet, a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine), enabling ImageNet-scale inference on microcontrollers. TinyNAS adopts a two-stage neural architecture search approach that first optimizes the search space to fit the resource constraints, then specializes the network architecture in the optimized search space. TinyNAS can automatically handle diverse constraints (i.e.
MCUFormer: Deploying Vision Tranformers on Microcontrollers with Limited Memory
Due to the high price and heavy energy consumption of GPUs, deploying deep models on IoT devices such as microcontrollers makes significant contributions for ecological AI. Conventional methods successfully enable convolutional neural network inference of high resolution images on microcontrollers, while the framework for vision transformers that achieve the state-of-the-art performance in many vision applications still remains unexplored. In this paper, we propose a hardware-algorithm co-optimizations method called MCUFormer to deploy vision transformers on microcontrollers with extremely limited memory, where we jointly design transformer architecture and construct the inference operator library to fit the memory resource constraint. More specifically, we generalize the one-shot network architecture search (NAS) to discover the optimal architecture with highest task performance given the memory budget from the microcontrollers, where we enlarge the existing search space of vision transformers by considering the low-rank decomposition dimensions and patch resolution for memory reduction. For the construction of the inference operator library of vision transformers, we schedule the memory buffer during inference through operator integration, patch embedding decomposition, and token overwriting, allowing the memory buffer to be fully utilized to adapt to the forward pass of the vision transformer.
Ariel-ML: Computing Parallelization with Embedded Rust for Neural Networks on Heterogeneous Multi-core Microcontrollers
Huang, Zhaolan, Schleiser, Kaspar, Myung, Gyungmin, Baccelli, Emmanuel
Low-power microcontroller (MCU) hardware is currently evolving from single-core architectures to predominantly multi-core architectures. In parallel, new embedded software building blocks are more and more written in Rust, while C/C++ dominance fades in this domain. On the other hand, small artificial neural networks (ANN) of various kinds are increasingly deployed in edge AI use cases, thus deployed and executed directly on low-power MCUs. In this context, both incremental improvements and novel innovative services will have to be continuously retrofitted using ANNs execution in software embedded on sensing/actuating systems already deployed in the field. However, there was so far no Rust embedded software platform automating parallelization for inference computation on multi-core MCUs executing arbitrary TinyML models. This paper thus fills this gap by introducing Ariel-ML, a novel toolkit we designed combining a generic TinyML pipeline and an embedded Rust software platform which can take full advantage of multi-core capabilities of various 32bit microcontroller families (Arm Cortex-M, RISC-V, ESP-32). We published the full open source code of its implementation, which we used to benchmark its capabilities using a zoo of various TinyML models. We show that Ariel-ML outperforms prior art in terms of inference latency as expected, and we show that, compared to pre-existing toolkits using embedded C/C++, Ariel-ML achieves comparable memory footprints. Ariel-ML thus provides a useful basis for TinyML practitioners and resource-constrained embedded Rust developers.
- Europe > Germany > Berlin (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Italy (0.04)
- (2 more...)
Single-Pixel Tactile Skin via Compressive Sampling
Slepyan, Ariel, Xing, Laura, Zhang, Rudy, Thakor, Nitish
Development of large-area, high-speed electronic skins is a grand challenge for robotics, prosthetics, and human-machine interfaces, but is fundamentally limited by wiring complexity and data bottlenecks. Here, we introduce Single-Pixel Tactile Skin (SPTS), a paradigm that uses compressive sampling to reconstruct rich tactile information from an entire sensor array via a single output channel. This is achieved through a direct circuit-level implementation where each sensing element, equipped with a miniature microcontroller, contributes a dynamically weighted analog signal to a global sum, performing distributed compressed sensing in hardware. Our flexible, daisy-chainable design simplifies wiring to a few input lines and one output, and significantly reduces measurement requirements compared to raster scanning methods. We demonstrate the system's performance by achieving object classification at an effective 3500 FPS and by capturing transient dynamics, resolving an 8 ms projectile impact into 23 frames. A key feature is the support for adaptive reconstruction, where sensing fidelity scales with measurement time. This allows for rapid contact localization using as little as 7% of total data, followed by progressive refinement to a high-fidelity image - a capability critical for responsive robotic systems. This work offers an efficient pathway towards large-scale tactile intelligence for robotics and human-machine interfaces.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Maryland > Baltimore (0.05)
- North America > United States > Texas (0.04)
- (8 more...)
- Health & Medicine > Therapeutic Area (0.68)
- Health & Medicine > Health Care Technology (0.46)
Reviewer
We thank the reviewers for their helpful comments. The code and models will be open-sourced. Along with peak RAM, we report inference FLOPs for all the models. Finally, the rationale behind ImageNet-10 can be found in Appendix A.1. And even then ReNet's accuracy is GRU or LSTM as the RNN unit; we use GRU as it is more efficient.
Supplementary Material On-Device Training Under 256KB Memory
We prepared a video demo showing that we can deploy our framework to a microcontroller (STM32F746, 320KB SRAM, 1MB Flash) to enable on-device learning. The training leads to decent accuracy within the tight memory budget. This is quite affordable for tiny on-device learning applications. We notice that the variance of different runs is quite small in our experiments. Here we provide the results of 3 runs in Table S1 to show the variance.
- South America > Suriname > North Atlantic Ocean (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- Asia > Taiwan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Communications > Networks (0.94)
- Information Technology > Artificial Intelligence > Cognitive Science (0.77)
Philly's 'transit vigilante' created a real-time bus tracker for his neighbors
Philly's'transit vigilante' created a real-time bus tracker for his neighbors With a sports timer and some clever coding, Max Goldberg built a DIY display that tells South Philly commuters exactly when their next bus will arrive. Breakthroughs, discoveries, and DIY tips sent every weekday. Philadelphia's mass transit system has had a rough go of it lately. The Pennsylvania city's main public transit provider, SEPTA, has been dealing with massive service cuts, including the elimination of entire bus routes. But South Philly resident Max Goldberg is undeterred.
- North America > United States > Pennsylvania (0.25)
- North America > United States > California > San Francisco County > San Francisco (0.15)
- North America > United States > Vermont (0.05)
- (4 more...)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (0.35)
- Information Technology > Communications (1.00)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Architecture > Real Time Systems (1.00)