Measuring the Effects of Data Parallelism on Neural Network Training