How Does Critical Batch Size Scale in Pre-training?

Open in new window