Generalizing Scaling Laws for Dense and Sparse Large Language Models