Mozart: Modularized and Efficient MoE Training on 3.5D Wafer-Scale Chiplet Architectures

Open in new window