Global Convergence in Training Large-Scale Transformers

Open in new window