Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Oct-11-2025, 00:36:18 GMT–Neural Information Processing Systems

As a cost-effective alternative, learning-free PTQ schemes have been proposed. However, the performance is somewhat limited because they cannot consider the inter-layer dependency within the attention module, which is a significant feature of Transformers. In this paper, we thus propose a novel PTQ algorithm that balances accuracy and efficiency. The key idea of the proposed algorithm called aespa is to perform quantization layer-wise for efficiency while targeting attention-wise reconstruction to consider the cross-layer dependency.

aespa, objective, quantization, (16 more...)

Neural Information Processing Systems

Oct-11-2025, 00:36:18 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > Experimental Study (0.93)

Technology:
- Information Technology
  - Communications (0.68)
  - Artificial Intelligence
    - Natural Language > Large Language Model (0.70)
    - Machine Learning > Neural Networks
      - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Similar Docs Excel Report more

Title	Similarity	Source
None found