Morphlux: Transforming Torus Fabrics for Efficient Multi-tenant ML
Kumar, Abhishek Vijaya, Ding, Eric, Devraj, Arjun, Bunandar, Darius, Singh, Rachee
–arXiv.org Artificial Intelligence
We develop Morphlux, a server-scale programmable photonic fabric to interconnect accelerators within servers. We show that augmenting state-of-the-art torus-based ML data-centers with Morphlux can improve the bandwidth of tenant compute allocations by up to 66%, reduce compute fragmentation by up to 70%, and minimize the blast radius of chip failures. We develop a novel end-to-end hardware prototype of Morphlux to demonstrate these performance benefits which translate to 1.72X improvement in training throughput of ML models. By rapidly programming the server-scale fabric in our hardware testbed, Morphlux can replace a failed accelerator chip with a healthy one in 1.2 seconds.
arXiv.org Artificial Intelligence
Oct-6-2025
- Country:
- Asia > Taiwan (0.04)
- North America > United States
- California > Santa Clara County
- Santa Clara (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- New York
- New York County > New York City (0.05)
- Tompkins County > Ithaca (0.04)
- Virginia (0.04)
- California > Santa Clara County
- Genre:
- Research Report (0.64)
- Industry:
- Information Technology > Services (0.67)
- Technology: