Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models

Open in new window