ZeRO: Memory Optimization Towards Training A Trillion Parameter Models

Open in new window