Optimizing Large Model Training through Overlapped Activation Recomputation

Open in new window