Critical Batch Size Revisited: ASimple Empirical Approach to Large-Batch Language Model Training

Open in new window