COBRA: Algorithm-Architecture Co-optimized Binary Transformer Accelerator for Edge Inference

Qiao, Ye, Chen, Zhiheng, Wang, Yian, Zhang, Yifan, Deng, Yunzhe, Huang, Sitao

arXiv.org Artificial Intelligence 

--Transformer-based models have demonstrated superior performance in various fields, including natural language processing and computer vision. However, their enormous model size and high demands in computation, memory, and communication limit their deployment to edge platforms for local, secure inference. Binary transformers offer a compact, low-complexity solution for edge deployment with reduced bandwidth needs and acceptable accuracy. However, existing binary transformers perform inefficiently on current hardware due to the lack of binary specific optimizations. T o address this, we introduce COBRA, an algorithm-architecture co-optimized binary Transformer accelerator for edge computing. COBRA features a real 1-bit binary multiplication unit, enabling matrix operations with -1, 0, and +1 values, surpassing ternary methods. With further hardware-friendly optimizations in the attention block, COBRA achieves up to 3,894.7 GOPS throughput and 448.7 GOPS/Watt energy efficiency on edge FPGAs, delivering a 311 energy efficiency improvement over GPUs and a 3.5 throughput improvement over the state-of-the-art binary accelerator, with only negligible inference accuracy degradation. I NTRODUCTION In recent years, transformer-based models have become foundational architectures across multiple domains, achieving state-of-the-art performance in tasks such as natural language processing [1], computer vision [2], and others [3], [4]. These models excel at capturing complex patterns through self-attention mechanisms and extensive parametrization, often involving billions of parameters to achieve superior performance. However, the increasing size of these models introduces significant computational, memory, and communication challenges, restricting their deployment to a wide range of devices, especially resource-constrained devices.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found