Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
–Neural Information Processing Systems
Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens in the sequence, thus incurring a quadratic cost. In this study, we present a novel approach that dynamically prunes contextual information while preserving the model's expressiveness, resulting in reduced memory and computational
Neural Information Processing Systems
Oct-9-2025, 07:49:57 GMT
- Country:
- Atlantic Ocean > Mediterranean Sea (0.04)
- South America > Chile
- North America > United States
- Oregon > Linn County > Lebanon (0.04)
- Europe
- Middle East > Cyprus (0.04)
- Switzerland
- Zürich > Zürich (0.04)
- Basel-City > Basel (0.04)
- Asia > Middle East
- Genre:
- Research Report
- Promising Solution (0.87)
- New Finding (0.87)
- Research Report
- Technology: