MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention Huiqiang Jiang, Yucheng Li

Neural Information Processing Systems 

Existing methods for speeding up pre-filling often fail to maintain acceptable accuracy or efficiency when applied to long-context LLMs.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found