Optimizing Large Model Training through Overlapped Activation Recomputation