LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers

Open in new window