Accelerating Transformers with Spectrum-Preserving Token Merging 1,3,4

May-29-2025, 02:43:20 GMT–Neural Information Processing Systems

Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and effective strategy is to merge token representations within Transformer models, aiming to reduce computational and memory requirements while maintaining accuracy. Prior works have proposed algorithms based on Bipartite Soft Matching (BSM), which divides tokens into distinct sets and merges the top k similar tokens. However, these methods have significant drawbacks, such as sensitivity to tokensplitting strategies and damage to informative tokens in later layers.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

May-29-2025, 02:43:20 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.67)

Genre:
- Research Report
  - Experimental Study (0.92)
  - New Finding (1.00)

Industry:
- Leisure & Entertainment > Sports
  - Baseball (0.67)
- Transportation > Ground
  - Road (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.87)
  - Natural Language > Large Language Model (0.88)
  - Vision (1.00)