AITopics | ouroboro

Neural Information Processing Systems http://nips.cc/

algorithm, arxiv preprint arxiv, transformer-based language model, (13 more...)

Neural Information Processing Systems

Country: North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Ouroboros: On Accelerating Training of Transformer-Based Language Models

Neural Information Processing SystemsDec-25-2025, 02:11:13 GMT

Language models are essential for natural language processing (NLP) tasks, such as machine translation and text summarization. Remarkable performance has been demonstrated recently across many NLP domains via a Transformer-based language model with over a billion parameters, verifying the benefits of model size. Model parallelism is required if a model is too large to fit in a single computing device. Current methods for model parallelism either suffer from backward locking in backpropagation or are not applicable to language models. We propose the first model-parallel algorithm that speeds the training of Transformer-based language models. We also prove that our proposed algorithm is guaranteed to converge to critical points for non-convex problems. Extensive experiments on Transformer and Transformer-XL language models demonstrate that the proposed algorithm obtains a much faster speedup beyond data parallelism, with comparable or better accuracy.

accelerating training, ouroboro, transformer-based language model, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

Ouroboros: On Accelerating Training of Transformer-Based Language Models

Qian Yang, Zhouyuan Huo, Wenlin Wang, Lawrence Carin

Neural Information Processing SystemsOct-2-2025, 06:47:26 GMT

We also prove that our proposed algorithm is guaranteed to converge to critical points for non-convex problems.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CARD: A Cache-Assisted Parallel Speculative Decoding Framework via Query-and-Correct Paradigm for Accelerating LLM Inference

Zhou, Enyu, Sheng, Kai, Chen, Hao, He, Xin

arXiv.org Artificial IntelligenceSep-22-2025

Speculative decoding (SD), where a draft model provides multiple candidate tokens for the target model to verify in parallel, has demonstrated significant potential for accelerating LLM inference. Y et, existing SD approaches adhere to a strict "draft-then-verify" paradigm, enforcing a sequential process that hampers performance and constrains the draft model's capacity. Moreover, rejecting a token in the candidate sequence invalidates all subsequent tokens, leading to wasted computation during drafting. To overcome these limitations, we propose a cache-assisted parallel speculative decoding framework called CARD, which employs a novel "query-and-correct" paradigm. Our approach decouples drafting from verification: the draft model populates a shared cache with candidate tokens, while the target model concurrently refines the draft's trajectory. This enables inference at near-draft-speed, effectively leveraging the draft model's efficiency without additional fine-tuning. Experimental results show that CARD significantly outperforms existing state-of-the-art methods, achieving up to a 4.83 acceleration over vanilla autoregressive decoding, with no fine-tuning required for either models.

large language model, machine learning, target model, (19 more...)

arXiv.org Artificial Intelligence

2508.04462

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Reviews: Ouroboros: On Accelerating Training of Transformer-Based Language Models

Neural Information Processing SystemsJan-22-2025, 01:42:15 GMT

The paper introduces a new method for model-parallel training, where layers of a model are distributed across multiple accelerators. The method avoids locking in the backward pass by using stale gradients during back-propagation. I'm not aware of any prior work that took such an approach. Furthermore, the authors provide theoretical claims and empirical results to demonstrate that their method has convergence properties similar to conventional SGD, despite using stale gradients. The lack of effective model-parallel training is a major roadblock for scaling up model sizes, and the proposed approach promises to overcome this issue.

accelerating training, model-parallel training, transformer-based language model, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Reviews: Ouroboros: On Accelerating Training of Transformer-Based Language Models

Neural Information Processing SystemsJan-22-2025, 01:42:04 GMT

This paper studies the problem of parallelising large transformer-based language models. It goes beyond data parallelism in that it focuses on splitting the model when it does not fit in the memory of a single GPU. The idea is to segment the model into groups such that GPUs do not sit around waiting on others to pass gradients ( this is the case for layer-wise parallel solutions where each layer is on its own GPU). The model then allows backpropagation to use stale gradients between groups. An L-layer network is split into K modules so that the weights of the network are divided into K groups and each group is placed on a GPU.

accelerating training, ouroboro, transformer-based language model, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Ouroboros: On Accelerating Training of Transformer-Based Language Models

Neural Information Processing SystemsOct-9-2024, 15:13:33 GMT

Language models are essential for natural language processing (NLP) tasks, such as machine translation and text summarization. Remarkable performance has been demonstrated recently across many NLP domains via a Transformer-based language model with over a billion parameters, verifying the benefits of model size. Model parallelism is required if a model is too large to fit in a single computing device. Current methods for model parallelism either suffer from backward locking in backpropagation or are not applicable to language models. We propose the first model-parallel algorithm that speeds the training of Transformer-based language models.

accelerating training, ouroboro, transformer-based language model, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding

Zhao, Weilin, Huang, Yuxiang, Han, Xu, Xu, Wang, Xiao, Chaojun, Zhang, Xinrong, Fang, Yewei, Zhang, Kaihuo, Liu, Zhiyuan, Sun, Maosong

arXiv.org Artificial IntelligenceJun-26-2024

Speculative decoding is a widely used method that accelerates the generation process of large language models (LLMs) with no compromise in model performance. It achieves this goal by using an existing smaller model for drafting and then employing the target LLM to verify the draft in a low-cost parallel manner. Under such a drafting-verification framework, drafting efficiency has become a bottleneck in the final speedup of speculative decoding. Therefore, generating longer drafts at less cost can lead to better decoding speedup. To achieve this, we introduce Ouroboros, which can generate draft phrases to parallelize the drafting process and meanwhile lengthen drafts in a training-free manner. The experimental results on various typical text generation tasks show that Ouroboros can achieve speedups of up to $2.4\times$ over speculative decoding and $3.9\times$ over vanilla decoding, without fine-tuning draft and target models.

arxiv preprint arxiv, ouroboro, target model, (14 more...)

arXiv.org Artificial Intelligence

2402.1372

Country:

North America > United States (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Ouroboros: On Accelerating Training of Transformer-Based Language Models

Yang, Qian, Huo, Zhouyuan, Wang, Wenlin, Carin, Lawrence

Neural Information Processing SystemsMar-18-2020, 22:46:59 GMT

Language models are essential for natural language processing (NLP) tasks, such as machine translation and text summarization. Remarkable performance has been demonstrated recently across many NLP domains via a Transformer-based language model with over a billion parameters, verifying the benefits of model size. Model parallelism is required if a model is too large to fit in a single computing device. Current methods for model parallelism either suffer from backward locking in backpropagation or are not applicable to language models. We propose the first model-parallel algorithm that speeds the training of Transformer-based language models.

accelerating training, ouroboro, transformer-based language model, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Ouroboros: On Accelerating Training of Transformer-Based Language Models

Yang, Qian, Huo, Zhouyuan, Wang, Wenlin, Huang, Heng, Carin, Lawrence

arXiv.org Machine LearningSep-14-2019

Language models are essential for natural language processing (NLP) tasks, such as machine translation and text summarization. Remarkable performance has been demonstrated recently across many NLP domains via a Transformer-based language model with over a billion parameters, verifying the benefits of model size. Model parallelism is required if a model is too large to fit in a single computing device. Current methods for model parallelism either suffer from backward locking in backpropagation or are not applicable to language models. We propose the first model-parallel algorithm that speeds the training of Transformer-based language models. We also prove that our proposed algorithm is guaranteed to converge to critical points for non-convex problems. Extensive experiments on Transformer and Transformer-XL language models demonstrate that the proposed algorithm obtains a much faster speedup beyond data parallelism, with comparable or better accuracy. Code to reproduce experiments is to be found at \url{https://github.com/LaraQianYang/Ouroboros}.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

1909.06695

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

ouroboro

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Ouroboros: On Accelerating Training of Transformer-Based Language Models

Ouroboros: On Accelerating Training of Transformer-Based Language Models

Ouroboros: On Accelerating Training of Transformer-Based Language Models

CARD: A Cache-Assisted Parallel Speculative Decoding Framework via Query-and-Correct Paradigm for Accelerating LLM Inference

Reviews: Ouroboros: On Accelerating Training of Transformer-Based Language Models

Reviews: Ouroboros: On Accelerating Training of Transformer-Based Language Models

Ouroboros: On Accelerating Training of Transformer-Based Language Models

Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding

Ouroboros: On Accelerating Training of Transformer-Based Language Models

Ouroboros: On Accelerating Training of Transformer-Based Language Models