Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws

Open in new window