Memory-Efficient Adaptive Optimization for Large-Scale Learning