PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

Open in new window