Loki: Low-rank Keys for Efficient Sparse Attention
–Neural Information Processing Systems
In particular, the self-attention mechanism used in LLM inference contributes significantly to these costs, which has sparked an interest in approximating the self-attention computation to reduce such costs.
Neural Information Processing Systems
Oct-9-2025, 20:22:09 GMT
- Country:
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Energy (0.46)
- Information Technology (0.46)
- Technology: