Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow

Wiese, Philip, İslamoğlu, Gamze, Scherer, Moritz, Macan, Luka, Jung, Victor J. B., Burrello, Alessio, Conti, Francesco, Benini, Luca

Aug-5-2024–arXiv.org Artificial Intelligence

One of the challenges for Tiny Machine Learning (tinyML) is keeping up with the evolution of Machine Learning models from Convolutional Neural Networks to Transformers. We address this by leveraging a heterogeneous architectural template coupling RISC-V processors with hardwired accelerators supported by an automated deployment flow. We demonstrate an Attention-based model in a tinyML power envelope with an octa-core cluster coupled with an accelerator for quantized Attention. Our deployment flow enables an end-to-end 8-bit MobileBERT, achieving leading-edge energy efficiency and throughput of 2960 GOp/J and 154 GOp/s at 32.5 Inf/s consuming 52.0 mW (0.65 V, 22 nm FD-SOI technology).

accelerator, interconnect, template, (15 more...)

arXiv.org Artificial Intelligence

Aug-5-2024

arXiv.org PDF

Add feedback

Country:
- Europe
  - Switzerland > Zürich
    - Zürich (0.05)
  - Italy
    - Piedmont > Turin Province
      - Turin (0.04)
    - Emilia-Romagna > Metropolitan City of Bologna
      - Bologna (0.05)
  - Croatia > Zagreb County
    - Zagreb (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found