How to Allocate Your Tokens? Scaling Laws with Training Steps and Batch Size

Open in new window