Memory Efficient Adaptive Optimization

Rohan Anil, Vineet Gupta, Tomer Koren, Yoram Singer

Neural Information Processing Systems 

Figure 2: Testlog-perplexityofa Transformer -Bigmodelon WMT'14 en!fr, whentrainingwith batchsizesof 384 (left) and 768 (right).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found