Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity

Open in new window