LOCOST: State-Space Models for Long Document Abstractive Summarization

Bronnec, Florian Le, Duong, Song, Ravaut, Mathieu, Allauzen, Alexandre, Chen, Nancy F., Guigue, Vincent, Lumbreras, Alberto, Soulier, Laure, Gallinari, Patrick

Jan-31-2024–arXiv.org Artificial Intelligence

State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of $O(L \log L)$, this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns. We evaluate our model on a series of long document abstractive summarization tasks. The model reaches a performance level that is 93-96% comparable to the top-performing sparse transformers of the same size while saving up to 50% memory during training and up to 87% during inference. Additionally, LOCOST effectively handles input texts exceeding 600K tokens at inference time, setting new state-of-the-art results on full-book summarization and opening new perspectives for long input processing.

architecture, computational linguistic, transformer, (15 more...)

arXiv.org Artificial Intelligence

Jan-31-2024

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America
  - United States > Minnesota
    - Hennepin County > Minneapolis (0.14)
  - Canada > Ontario
    - Toronto (0.04)
- Europe > France
  - Île-de-France > Paris > Paris (0.04)
- Asia
  - Singapore (0.04)
  - Middle East
    - Jordan (0.04)
    - Iran > Isfahan Province
      - Isfahan (0.04)
  - Japan > Honshū
    - Kansai > Osaka Prefecture > Osaka (0.04)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.93)

Industry:
- Health & Medicine > Therapeutic Area (0.69)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (0.93)
  - Machine Learning > Neural Networks
    - Deep Learning (0.94)