Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Open in new window