PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management

Open in new window