Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens

Luo, Weiyao, Zheng, Suncong, Xia, Heming, Wang, Weikang, Lei, Yan, Liu, Tianyu, Chen, Shuang, Sui, Zhifang

Jun-16-2024–arXiv.org Artificial Intelligence

Large language models (LLMs) have shown promising efficacy across various tasks, becoming powerful tools in numerous aspects of human life. However, Transformer-based LLMs suffer a performance degradation when modeling long-term contexts due to they discard some information to reduce computational overhead. In this work, we propose a simple yet effective method to enable LLMs to take a deep breath, encouraging them to summarize information contained within discrete text chunks. Specifically, we segment the text into multiple chunks and insert special token at the end of each chunk. We then modify the attention mask to integrate the chunk's information into the corresponding token. This facilitates LLMs to interpret information not only from historical individual tokens but also from the token, aggregating the chunk's semantic information. Experiments on language modeling and out-of-domain downstream tasks validate the superiority of our approach.

arxiv preprint arxiv, information, sentinel token, (12 more...)

arXiv.org Artificial Intelligence

Jun-16-2024

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
- Asia > China
  - Hong Kong (0.04)

Genre:
- Research Report > New Finding (0.66)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.91)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found