Reviews: Ouroboros: On Accelerating Training of Transformer-Based Language Models

Jan-22-2025, 01:42:04 GMT–Neural Information Processing Systems

This paper studies the problem of parallelising large transformer-based language models. It goes beyond data parallelism in that it focuses on splitting the model when it does not fit in the memory of a single GPU. The idea is to segment the model into groups such that GPUs do not sit around waiting on others to pass gradients ( this is the case for layer-wise parallel solutions where each layer is on its own GPU). The model then allows backpropagation to use stale gradients between groups. An L-layer network is split into K modules so that the weights of the network are divided into K groups and each group is placed on a GPU.

accelerating training, ouroboro, transformer-based language model, (3 more...)

Neural Information Processing Systems

Jan-22-2025, 01:42:04 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.72)
  - Machine Learning > Neural Networks
    - Deep Learning (0.72)