Information-Theoretic Foundations for Neural Scaling Laws