Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping

Open in new window