ZeRO: Memory Optimization Towards Training A Trillion Parameter Models