Accelerating Transformers with Spectrum-Preserving Token Merging 1,3,4
–Neural Information Processing Systems
Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and effective strategy is to merge token representations within Transformer models, aiming to reduce computational and memory requirements while maintaining accuracy. Prior works have proposed algorithms based on Bipartite Soft Matching (BSM), which divides tokens into distinct sets and merges the top k similar tokens. However, these methods have significant drawbacks, such as sensitivity to tokensplitting strategies and damage to informative tokens in later layers.
Neural Information Processing Systems
May-29-2025, 02:43:20 GMT
- Country:
- North America > United States (0.67)
- Genre:
- Research Report
- Experimental Study (0.92)
- New Finding (1.00)
- Research Report
- Industry:
- Leisure & Entertainment > Sports
- Baseball (0.67)
- Transportation > Ground
- Road (0.46)
- Leisure & Entertainment > Sports
- Technology: