Training Trajectories of Language Models Across Scales