Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules

Open in new window