Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction

Oct-11-2024, 00:55:00 GMT–Neural Information Processing Systems

Training deep learning models can be computationally expensive. Prior works have shown that increasing the batch size can potentially lead to better overall throughput. However, the batch size is frequently limited by the accelerator memory capacity due to the activations/feature maps stored for the training backward pass, as larger batch sizes require larger feature maps to be stored. Transformer-based models, which have recently seen a surge in popularity due to their good performance and applicability to a variety of tasks, have a similar problem. To remedy this issue, we propose Tempo, a new approach to efficiently use accelerator (e.g., GPU) memory resources for training Transformer-based models.

accelerating transformer-based model training, memory footprint reduction, tempo, (5 more...)

Neural Information Processing Systems

Oct-11-2024, 00:55:00 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)