Full Stack Optimization of Transformer Inference: a Survey

Open in new window