Full Stack Optimization of Transformer Inference: a Survey