Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training

Open in new window