Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition

Open in new window