An empirical analysis of compute-optimal large language model training

Open in new window