What Language Model to Train if You Have One Million GPU Hours?

Open in new window