Mesh-TensorFlow: Deep Learning for Supercomputers
Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman
–Neural Information Processing Systems
However,batch-splitting suffers from problems including the inability to train very large models (due to memory constraints), high latency, and inefficiency at small batch sizes. All of these can be solved by more general distribution strategies (model-parallelism). Unfortunately,efficient model-parallel algorithms tend tobe complicated todiscover, describe, and to implement, particularly on large clusters.
Neural Information Processing Systems
Feb-12-2026, 15:42:17 GMT
- Country:
- North America
- Canada > Quebec
- Montreal (0.04)
- United States
- Nevada > Washoe County
- Reno (0.04)
- Texas > Travis County
- Austin (0.04)
- Nevada > Washoe County
- Canada > Quebec
- North America
- Technology: