Mesh-TensorFlow: Deep Learning for Supercomputers
Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman
–Neural Information Processing Systems
All of these can be solved by more general distribution strategies (model-parallelism). Unfortunately, efficient model-parallel algorithms tend to be complicated to discover, describe, and to implement, particularly on large clusters.
Neural Information Processing Systems
Nov-20-2025, 15:48:31 GMT
- Country:
- Europe > Spain
- Canary Islands (0.04)
- North America
- Canada > Quebec
- Montreal (0.04)
- United States
- Nevada > Washoe County
- Reno (0.04)
- Texas > Travis County
- Austin (0.04)
- Nevada > Washoe County
- Canada > Quebec
- Europe > Spain
- Technology: