A Low-Power Streaming Speech Enhancement Accelerator For Edge Devices
Wu, Ci-Hao, Chang, Tian-Sheuan
–arXiv.org Artificial Intelligence
--Transformer-based speech enhancement models yield impressive results. However, their heterogeneous and complex structure restricts model compression potential, resulting in greater complexity and reduced hardware efficiency. Additionally, these models are not tailored for streaming and low-power applications. Addressing these challenges, this paper proposes a low-power streaming speech enhancement accelerator through model and hardware optimization. The proposed high performance model is optimized for hardware execution with the co-design of model compression and target application, which reduces 93.9% of model size by the proposed domain-aware and streaming-aware pruning techniques. The required latency is further reduced with batch normalization-based transformers. Additionally, we employed softmax-free attention, complemented by an extra batch normalization, facilitating simpler hardware design. The tailored hardware accommodates these diverse computing patterns by breaking them down into element-wise multiplication and accumulation (MAC). This is achieved through a 1-D processing array, utilizing configurable SRAM addressing, thereby minimizing hardware complexities and simplifying zero skipping. This enhancement is crucial for various natural language processing (NLP) tasks, including speech recognition, machine translation, and hearing aids. Transformer-based speech enhancement models, such as [1], [2], have received significant attention in recent years due to their superior performance and parallel computing capabilities relative to other methods. The model shown in Figure 1 is heterogeneous, comprising an encoder and decoder that use convolutional neural networks (CNN) for speech extraction and restoration. In addition, it employs a masking module with transformers to filter out noise. However, its large model size and computational complexity become bottlenecks for low-power and real-time edge applications. This work was supported by the National Science and Technology Council, Taiwan, under Grant 111-2622-8-A49-018-SB, 110-2221-E-A49-148-MY3, and 110-2218-E-A49-015-MBK.
arXiv.org Artificial Intelligence
Mar-27-2025
- Genre:
- Research Report (1.00)
- Industry:
- Semiconductors & Electronics (1.00)
- Health & Medicine > Therapeutic Area (0.54)
- Technology: