Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping

Open in new window