The Effect of Network Width on the Performance of Large-batch Training