Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences

Chen, Yifan, Zeng, Qi, Hakkani-Tur, Dilek, Jin, Di, Ji, Heng, Yang, Yun

Dec-10-2021–arXiv.org Machine Learning

Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. To address this limitation, Linformer and Informer are proposed to reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection respectively. These two models are intrinsically connected, and to understand their connection, we introduce a theoretical framework of matrix sketching. Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention with three carefully designed components: column sampling, adaptive row normalization and pilot sampling reutilization. Experiments on the Long Range Arena (LRA) benchmark demonstrate that our methods outperform alternatives with a consistently smaller time/space footprint.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

Dec-10-2021

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Oregon (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Illinois > Champaign County
      - Urbana (0.04)
    - California
      - San Diego County > San Diego (0.04)
      - Los Angeles County > Long Beach (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - France (0.04)
  - United Kingdom > England
    - Oxfordshire > Oxford (0.04)
    - Cambridgeshire > Cambridge (0.04)
  - Italy > Lazio
    - Rome (0.04)
- Asia > China
  - Hong Kong (0.04)
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)

Genre:
- Research Report (0.64)

Industry:
- Government > Regional Government > North America Government > United States Government (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning (1.00)
  - Representation & Reasoning (0.93)