Goto

Collaborating Authors

 figure 23


T-REX: A 68-567 {\mu}s/token, 0.41-3.95 {\mu}J/token Transformer Accelerator with Reduced External Memory Access and Enhanced Hardware Utilization in 16nm FinFET

Moon, Seunghyun, Li, Mao, Chen, Gregory, Knag, Phil, Krishnamurthy, Ram, Seok, Mingoo

arXiv.org Artificial Intelligence

This work introduces novel training and post-training compression schemes to reduce external memory access during transformer model inference. Additionally, a new control flow mechanism, called dynamic batching, and a novel buffer architecture, termed a two-direction accessible register file, further reduce external memory access while improving hardware utilization.