Generalizing Scaling Laws for Dense and Sparse Large Language Models

Open in new window