Learn To be Efficient: Build Structured Sparsity in Large Language Models

Neural Information Processing Systems 

L TE reduces LLaMA2-7B inference latency by 25% at 50% sparsity.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found