Ouroboros: On Accelerating Training of Transformer-Based Language Models

Qian Yang, Zhouyuan Huo, Wenlin Wang, Lawrence Carin

Neural Information Processing Systems 

We also prove that our proposed algorithm is guaranteed to converge to critical points for non-convex problems.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found