Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations

Open in new window