Goto

Collaborating Authors

 deeploy


Deeploy: Enabling Energy-Efficient Deployment of Small Language Models On Heterogeneous Microcontrollers

Scherer, Moritz, Macan, Luka, Jung, Victor, Wiese, Philip, Bompani, Luca, Burrello, Alessio, Conti, Francesco, Benini, Luca

arXiv.org Artificial Intelligence

Despite many recent successes with previous-generation Deep The latest evolutions in mainstream Artificial Intelligence (AI) Neural Networks (DNNs), the emergence of the tinyML paradigm have been driven by Transformers, which have taken over from for EFMs faces the dual challenge of reducing FMs to a manageable Recurrent Neural Networks (RNNs) and Convolutional Neural size and enabling their deployment on tiny devices. Networks (CNNs) as the leading edge models for language A first concrete step in this direction is the recent introduction of processing and multi-modal applications [1], [2]. The success of Small Language Models (SLMs): FMs with tens to a few hundred Transformers can be primarily attributed to the emergence of the million, rather than several billion parameters [8], [9]. While Foundation Model (FM) paradigm: large Transformer models most currently available FMs are focused on processing natural extensively pre-trained on datasets spanning trillions of tokens and language at a proof-of-concept scale, the effort towards embedded then fine-tuned with a much lower volume of labeled data to solve multi-modal sensor inputs with small-scale, application-specific domain-specific problems. Following the success of FMs in Natural FMs offers a highly promising path for the development of this Language Processing (NLP) [1], [3], an increasing number of fields novel class of models.


Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow

Wiese, Philip, İslamoğlu, Gamze, Scherer, Moritz, Macan, Luka, Jung, Victor J. B., Burrello, Alessio, Conti, Francesco, Benini, Luca

arXiv.org Artificial Intelligence

One of the challenges for Tiny Machine Learning (tinyML) is keeping up with the evolution of Machine Learning models from Convolutional Neural Networks to Transformers. We address this by leveraging a heterogeneous architectural template coupling RISC-V processors with hardwired accelerators supported by an automated deployment flow. We demonstrate an Attention-based model in a tinyML power envelope with an octa-core cluster coupled with an accelerator for quantized Attention. Our deployment flow enables an end-to-end 8-bit MobileBERT, achieving leading-edge energy efficiency and throughput of 2960 GOp/J and 154 GOp/s at 32.5 Inf/s consuming 52.0 mW (0.65 V, 22 nm FD-SOI technology).


When is AI actually explainable?

#artificialintelligence

Explainability is a fascinating topic. It covers a research field where a wide variety of experts come together: mathematicians, engineers, psychologists, philosophers and regulators, which makes it one of the most interesting. I have been involved in quite some AI projects where explainability -- or XAI -- turned out to be crucial. So, I decided to gather and share my experiences, and the experiences of my colleagues at Deeploy. AI is one of the biggest innovations of our time. It can change the way we live, work, care, teach and interact with each other.