Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

Open in new window