Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism