Memory Efficient Adaptive Optimization

Rohan Anil, Vineet Gupta, Tomer Koren, Yoram Singer

Neural Information Processing Systems 

Our method retains the benefits of per-parameter adaptivity while allowing significantly larger models and batch sizes.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found