Mesh-TensorFlow: Deep Learning for Supercomputers
Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman
–Neural Information Processing Systems
All of these can be solved by more general distribution strategies (model-parallelism). Unfortunately, efficient model-parallel algorithms tend to be complicated to discover, describe, and to implement, particularly on large clusters.
Neural Information Processing Systems
Nov-20-2025, 15:48:31 GMT
- Country:
- North America
- United States
- Texas > Travis County
- Austin (0.04)
- Nevada > Washoe County
- Reno (0.04)
- Texas > Travis County
- Canada > Quebec
- Montreal (0.04)
- United States
- Europe > Spain
- Canary Islands (0.04)
- North America
- Technology: