Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models

Open in new window