Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs

Open in new window