Pipelined Decoder for Efficient Context-Aware Text Generation

Huang, Zixian, Niu, Chenxu, Gu, Yu, Xiao, Gengyang, Huang, Xinwei, Cheng, Gong

Jul-2-2025–arXiv.org Artificial Intelligence

As the basis of generative AI, an autoregressive model requires the generation of a new token depending on all the previously generated tokens, which brings high quality but also restricts the model to generate tokens one by one, forming a bottleneck limiting the generation speed. In this paper, we propose a new decoder architecture that efficiently generates text in parallel for context-aware generation tasks. Our proposed pipelined decoder initiates the generation of multiple subsequences simultaneously, and, at each time-step, it generates a new token for each subsequence to realize parallelism. Experiments on multiple text generation tasks, including question answering, text summarization, and keyphrase generation, show that our pipelined decoder significantly improves the generation speed without a significant loss of generation quality or additional memory consumption.

decoder, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Jul-2-2025

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- Asia (0.93)
- North America > United States
  - Hawaii (0.14)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (0.94)
    - Generation (0.66)
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found