H3T: Efficient Integration of Memory Optimization and Parallelism for High-Throughput Transformer Training Y uzhong Wang

Open in new window