Contextual Sparsity with Correction for Efficient LLMs Y ang Zhou
–Neural Information Processing Systems
With the blossom of large language models (LLM), inference efficiency becomes increasingly important. V arious approximate methods are proposed to reduce the cost at inference time. Contextual Sparsity (CS) is appealing for its training-free nature and its ability to reach a higher compression ratio seemingly without significant performance degradation. However, after a comprehensive evaluation of contextual sparsity methods on various complex generation tasks, we find that although CS succeeds in prompt-understanding tasks, it significantly degrades the model performance for reasoning, deduction, and knowledge-based tasks.
Neural Information Processing Systems
Nov-15-2025, 06:47:29 GMT
- Country:
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Genre:
- Research Report > Experimental Study (0.93)
- Technology: