Blockwise Compression of Transformer-based Models without Retraining

Open in new window