Recurrent Memory Transformer

Oct-10-2024, 21:58:50 GMT–Neural Information Processing Systems

Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this work, we propose and study a memory-augmented segment-level recurrent Transformer (RMT).

information, recurrent memory transformer, sequence, (2 more...)

Neural Information Processing Systems

Oct-10-2024, 21:58:50 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.60)
  - Natural Language (0.42)