Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism

Open in new window