Convexifying Transformers: Improving optimization and understanding of transformer networks

Open in new window