Scaling Performance of Large Language Model Pretraining