H3T: Efficient Integration of Memory Optimization and Parallelism for High-Throughput Transformer Training Y uzhong Wang
–Neural Information Processing Systems
Transformer-based models, their huge parameter size poses a serious challenge to their training, both from the storage and computation perspectives.
Neural Information Processing Systems
Feb-15-2026, 01:14:25 GMT