Memory Efficient Adaptive Optimization
Rohan Anil, Vineet Gupta, Tomer Koren, Yoram Singer
–Neural Information Processing Systems
Figure 2: Testlog-perplexityofa Transformer -Bigmodelon WMT'14 en!fr, whentrainingwith batchsizesof 384 (left) and 768 (right).
Neural Information Processing Systems
Feb-12-2026, 22:21:05 GMT
- Country:
- Technology: