GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference

Open in new window