Transformers from an Optimization Perspective