Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Oct-9-2025, 07:49:57 GMT–Neural Information Processing Systems

Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens in the sequence, thus incurring a quadratic cost. In this study, we present a novel approach that dynamically prunes contextual information while preserving the model's expressiveness, resulting in reduced memory and computational

arxiv preprint arxiv, large language model, machine learning, (16 more...)

Neural Information Processing Systems

Oct-9-2025, 07:49:57 GMT

Conferences PDF

Add feedback

Country:
- Atlantic Ocean > Mediterranean Sea (0.04)
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States
  - Oregon > Linn County > Lebanon (0.04)
- Europe
  - Middle East > Cyprus (0.04)
  - Switzerland
    - Zürich > Zürich (0.04)
    - Basel-City > Basel (0.04)
- Asia > Middle East
  - Lebanon (0.04)
  - Syria (0.04)
  - Jordan (0.04)
  - Israel (0.04)

Genre:
- Research Report
  - Promising Solution (0.87)
  - New Finding (0.87)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Similar Docs Excel Report more

Title	Similarity	Source
None found