TAGC: Optimizing Gradient Communication in Distributed Transformer Training

Open in new window